1
PROGRAM S
GRAMMARS
ARGUMENTS
Joachim Lambek
2
Contents
1 Calculations on an abacus 7
1.1 What is a calculation? 7
1.2 Calculable functions 11
1.3 Primitive recursive functions are calculable 15
1.4 Some primitive recursive functions 20
1.5 The minimization scheme 22
1.6 Recursive functions and relations 24
1.7 How to calculate the nth prime 27
2 More about calculability 31
2.1 What is a program? 31
2.2 Calculable functions are recursive 33
2.3 Digression: enumerations 35
2.4 Example of a noncalculable function 38
2.5 Turing programs 40
2.6 Effective procedures 43
3 Syntactic types and semigroups 47
3.1 A simple grammar using types 47
3.2 Analysis of the English verb phrase 50
3.3 Syntactic calculus 53
3.4 Multiplicative systems 56
3.5 Semigroups and monoids 61
4 Context-free grammars 65
4.1 Grammars for some fragments of English 65
4.2 Production grammars and deductive systems 68
4.3 Grammars of some formal languages 72
4.4 New languages from old 77
4.5 How are sentences produced and recognized? 79
3
4 CONTENTS
4.6 Machines for producing and recognizing sentences 84
4.7 Derivations and parsing trees 89
4.8 A necessary condition for a language to be context-free 93
5 Generative and transformational grammars 99
5.1 Inflection rules conjugation 99
5.2 More about tense and verb phrase 103
5.3 More about the noun phrase 107
5.4 passive construction, transformational grammar 110
5.5 Some other transformations 116
References 121
Appendix: A mathematician looks at the French verb 123
6 Introduction 123
7 The present tense 124
8 The other tenses 125
9 Factorizing the conjugation matrix 127
10 Radicals of ordinary verbs 128
11 Radicals of exceptional verbs 129
12 A production grammar for a fragment of French 130
Preface
These notes began as notes for a course called " Computability and Mathematical Lin-
guistics" taught at McGill University for about 25 years, beginning in 1974. It was quite
successful, but after Professor Lambek and I retired, there was no one who was sufficiently
interested in teaching the course as designed and it eventually disappeared. There was
a proposal to add quite a bit of logic to the notes and publish it jointly with Phil Scott
and me, but this project never much went beyond putting the original notes (augmented
by the paper on the French verb) into TgX. Now, nearly fifteen years later, Professor
Lambek and I decided to put this online in case there is any interest. There are two
additional comments by me on pages 82 and 115.
Michael Barr, Montreal, 2007-08-16.
5
CONTENTS
Chapter 1
Calculations on an abacus
1.1 What is a calculation?
Some people say that a calculation is what can be done on an electronic computer, while
others would be satisfied with pencil and paper. To give a precise answer to this question,
Turing in 1936 invented an abstract device, later called a "Turing machine". His idea
proved to be very fruitful, both in answering some theoretical questions on so-called
"word problems" and in stimulating the development of the modern electronic computer.
However, there is an easier concept that goes back to antiquity and will do the same thing.
The word "calculate" is derived from the Latin word for pebble. At one time these
pebbles were placed into grooves on a table, called "abacus" from a Greek word for table.
Let us here define an abacus as consisting of
(a) a potentially infinite number of locations A, B, C . . ., that may be distinguished
from one another, and
(b) a potentially infinite number of pebbles that need not be distinguishable.
Let it also be understood that
(c) each location can hold as many pebbles as one wishes.
It will turn out that for a given calculation only a finite number of locations and
pebbles are needed, but we do not wish to impose a bound on this number beforehand.
In the same way, when talking about calculations with paper and pencil, one does not
want to limit the amount of paper and the number of pencils to be used.
Essentially there are only two operations one can perform on an abacus. One can put
a pebble into a designated location, or one can take a pebble from a designated location.
The first operation can always be performed, the second can only be performed if the
7
8
CHAPTER 1. CALCULATIONS ON AN ABACUS
location is not empty. Accordingly there are two elementary instructions, as indicated
by the following diagrams:
v
X+ Y~
I
The first instruction reads: go to location X, deposit one pebble, then go to wherever
the outgoing arrow points. The second instruction reads: go to location Y, remove one
pebble if this can be done; in that case go to wherever the left arrow points; in the contrary
case (that is, when Y found to be empty), go to wherever the right arrow points. It may
be convenient to consider two further elementary instructions:
start
stop
The first of these says: at the beginning go to wherever the arrow points. The second one
says: stop.
By a program we shall mean a finite number of such elementary instructions fitted
together in a specified manner. We shall not make this precise at the moment, instead
we shall illustrate some simple programs by their flow diagrams.
1.1.1 Example Empty location X.
start
X_
x
What does this picture mean? Instruction: remove one pebble from location X. If this
can be done, return to the same instruction; otherwise, stop. Suppose, for example, there
are 3 pebbles at X to start with, then successive contents of location X are indicated as
follows:
X
2
1
In general, if the initial content of X is x pebbles, then its final content is pebbles.
1.1. WHAT IS A CALCULATION?
9
1.1.2 Example Transfer content of X to Y.
X Y
x y
x + y
What does this flow diagram mean?
Instruction (1): remove one pebble from location X. If this can be done, go to instruction
(2). If it cannot be done, stop.
Instruction (2): add one pebble to location Y. Then go to instruction (1). For example,
suppose there are 2 pebbles at location X and 3 pebbles at location Y to start with.
X Y
~~2 3~
1 4
5
hen successive contents of those two locations are as follows:
We shall use the word configuration to denote a complete listing of the contents of
all locations, but we only mention those locations which are nonempty at some stage of
a calculation. In the present example, an initial configuration of x pebbles at X and y
pebbles at Y is transformed by the program to a terminal configuration of pebbles at
X and y + x pebbles at Y.
1.1.3 Example copy content of X into Y.
start
In discussing the performance of this program, we assume that T is empty at the
beginning of the calculation. It follows that T is also empty at the end. We call T a tem-
porary storage location. It serves the same purpose as scratch paper when calculating
with paper and pencil.
start
10
CHAPTER 1. CALCULATIONS ON AN ABACUS
In the above flow diagram, we have used T ->■ X , which means "transfer content of
T to X" as a so-called subroutine. In accordance with Example 1.1.2, the rectangle is
to be replaced by
Note that the start and stop instructions within the subroutine should not be included,
although the arrows going to and from them should be as the entry and exit points of the
subroutines. We will always put a subroutine in a box.
Assuming the same initial configuration as in Example 1.1.2, with T empty, here are
successive configurations for Example 1.1.3.
X Y T
2 3
1 4 1
5 2
2 5
The astute reader will have noticed that the programs of Examples 1.1.2 and 1.1.3
could have served stone-age man in the task of adding two numbers.
Exercises
1. What do the following programs do:
start start
1 '
x-
stop
start
x-
Y +
stop X +
stop
Y-
stop
(In the second and third diagrams the stopinstruction has been written more than once
to simplify the drawing of the diagrams. Nonetheless, there is only one stop. This is
1.2. CALCULABLE FUNCTIONS
11
necessary in order that when a program is used as a subroutine, it will have just one
exit.)
2. Draw the flow diagram for a program which will result in interchanging the contents
of locations X and Y.
1.2 Calculable functions
It is convenient to include among the natural numbers. The set of natural numbers
is thus N = {0,1,2,...}. By a numerical function of one variable we mean a mapping
/ : N ^N, and we write y = f(x) to indicate that the value of / at x is y. By a
numerical function of two variables we mean a mapping / : N x N > N, and we write
z = f(x,y) to indicate that / assigns to the pair (x,y) the value z. We are interested in
calculating numerical functions of any number of variables on an abacus.
We shall say that a program calculates the function y = f(x) with the help of
X Y T
the temporary storage location T if it transforms the configuration — — to
X Y T
It is assumed that Y and T are empty at the beginning of the calcula-
x f(x)
tion. T may be occupied at some stages during the calculation, but must be empty at
the end. No other locations are supposed to be involved in the calculation, as we did not
specify any. The fact that X has the same content at the end of the calculation as at the
beginning is required for technical reasons: we don't want the input to be lost. Otherwise
X Y T
we could have permitted the final configuration — — — .
f(x)
We will denote by Y := f(x) a machine that calculates assigns to the location Y the
value of the function / evaluated at the contents x of the location X and leaves all other
locations unchanged, that is the machine that calculates y = f(x). Similarly a machine
that calculates z = f(x,y) will be denoted Z := f(y,z)
For example, the identity function y = x, the successor function y = S(x) — x + 1,
12
CHAPTER 1. CALCULATIONS ON AN ABACUS
and the constant function y = 3 are calculated by the following three programs:
start
copy X into Y
stop
start
>'
Y := x
>-
Y+
>-
stop
start
Y
Y+
Y
Y+
Y
r+
Y
stop
This is shown by the initial and final configurations:
X Y T X Y T
X Y T
x
x
x
x
x S(x)
X
3
Note that the subroutine copy X into Y is exactly the same as Y := x .
Hard working students are invited to draw the flow diagram for the constant function
y = 17.
Aside from such trivial functions, what other functions can be calculated on an abacus?
Consider the following numerical functions:
y = the (x + l)st prime number,
y = the number of primes less than x,
y = the (x + l)st digit of ir.
All these can be calculated on the abacus, and for two of them we shall construct
programs later. In fact, at the moment, the reader will probably not be able to think of
a numerical function that cannot be calculated on an abacus.
More generally, we say that a program calculates a numerical function z — g(x±, x 2 , ■ ■ ■ ■
of n variables with the help of temporary storage locations T 1; T 2 , ... , T fc , if its initial
and final configurations are as follows:
Xi X 2
x„
T
Xi
Xi
X2
X 2
x n U
g(x u ...,x n )
1.2. CALCULABLE FUNCTIONS
13
For example, here is a simple minded program for calculating z = x + y:
start
I
copy X into Z
I
copy Y into Z
I
stop
We shall presently give another program for calculating addition which is more sophisti-
cated, but which has the advantage of serving as a model for programs that will calculate
multiplication and exponentiation.
What is meant by the sum x + y of two numbers? We do know that x + = x.
Moreover, if we happen to know the value of x + 173, then we also know that x + 174 is
one more.
We shall write u S(y)" for the successor of y. (We are not allowed to call this y + 1
until after addition has been defined.) We then have quite generally
x + S(y) = S(x + y)
The sum of two numbers is now completely determined, once we decide to define the
natural numbers by
1 = 5(0), 2 = 5(1), 3 = 5(2),... etc.
For example, we have
3 + 2 = 3 + 5(1) = 5(3 + 1)
where
3 + 1 = 3 + 5(0) = 5(3 + 0) = 5(3) = 4
hence
3 + 2 = 5(4) = 5
The definition of addition discussed here is called a recursive definition.
We shall now present programs for the functions z — x + y, z = x *y, z = x y . Note
that these functions are defined recursively as follows:
x + = x ( x *0 = { x° = 1
x + S(y) = S(x + y) 1 x * S(y) = x * y + x 1 x s ^ = x y * x
14
CHAPTER 1. CALCULATIONS ON AN ABACUS
where 1 = S(0).
Here is the program to calculate z = x + y:
start
transfer Z to U\ stop
calculate z — S(u)
X
x
x
X
X
Y
y
o
1
z
x
x+1
T
y-i
y-i
y-2
IT
o
x
y x+y
Some of the configurations which appear during the calculation are shown in the above
table. In addition to the temporary storage locations T and U, there are also other
temporary storage locations which are used in the subroutines.
To calculate z
x * y and z = x y one need only replace the subroutine Z:
Z:=x
in the
by
Z:= u+x
and Z:= u * x , respectively.
Exercises
1. Draw the flow diagram of a program for the function z = x 2 . (You may regard this
as a special case of z = x * y or of z = x y '.)
1.3. PRIMITIVE RECURSIVE FUNCTIONS ARE CALCULABLE
15
2. Which function y — f(x) is calculated by the following program:
start
calculate t — x
calculate u = x + 1
\
calculate y = t ■ u
empty T
empty U
stop
3. Draw the flow diagram of a program to calculate y = x 2 + 2x + 2.
4. The function z = x j y is defined thus: x f = l,x t = x^ 1 "^. For instance,
£ t 3 = x^ 1 ^ 1 . Write out a program.
1.3 Primitive recursive functions are calculable
The functions x + y, x * y, and x ] y are all examples of "primitive recursive" functions.
Probably all the numerical functions you can think of are primitive recursive. By a
numerical function of k variables we understand a function y = f(xi,x 2 , ■ ■ ■ ,Xk), where
Xi,x 2 , ■ ■ ■ ,Xk and y all range over the set of natural numbers. When k — 0, this reduces
to y — c, where c is a natural number.
We shall now list a number of numerical functions and some schemes for producing
new functions from known functions.
R-l. successor function: f(x) = S(x)
R-2. constant functions: f(x 1: ... : x n ) = k
R-3. projection functions: f(x±, . . . , x n ) = Xk
R 4. substitution scheme:
f(x u ...,x n )= g(hl(x 1 , ...,x n ),..., h k (x u x n ))
16
CHAPTER 1. CALCULATIONS ON AN ABACUS
R-5. recursion scheme:
f(xi, x n -i, 0) = g(x ± , x n -i)
f(xi, x n -!,S(y)) = h(x 1: x n -i, y, f(xi, x n -i,y))
Remarks. R-2 and R-3 are families of functions, one for each pair of natural numbers
n and k. R-3 includes the identity function f(x) = x as a special case, when n — k — 1.
R-4 produces one function f of n variables, given one function g of k variables and k
functions hi, . . ., hk of n variables. R-5 recursively defines a function / of n variables,
given a function g of n — 1 variables and a function h of n + 1 variables. Special cases of
R-5 are the following:
R-5, case n — 1,
7(0) = c
f(S(y)) = h(y,f(y))
R-5, case n = 2,
/(*, 0) =
f(x,S(y)) = h(x,y,f(y))
Informally we may define primitive recursive functions as all functions of type R-l,
R-2 and R-3, as well as all functions derivable from these by schemes R-4 and R-5.
More formally, a mathematician would put this as follows:
1.3.1 Definition The class of primitive recursive functions is the smallest class
of numerical functions containing all functions of types R-l, R-2 and R-3 and closed
under schemes R-4 and R-5.
To say that a class of numerical functions is closed under scheme R-4 means
that if hi(xi, . . . ,x n ), . . . ,hk(xi, . . . ,x n ) and g(yi, . . . ,yu) are in the class, then so is
f(xi, . . . ,x n ) as defined by R-4-
To say that a class of numerical functions is closed under scheme R-5 means that if
g(xi, . . . ,x n -i) and h(xi, . . . ,x n -i,y, z) are in the class, then so is f(xi, . . . x n -i, y) as
defined by R-5.
for example, why is x + y a primitive recursive function? We recall that the recursive
definition of x + y reads
J x + = x
\x + S(y) = S(x + y)
It will be convenient to denote the primitive recursive function defined by R-3 by p%, thus
pl(xi,...x n ) = x k
We note that x = p\(x) and that x + y = p\{x, y, x + y), hence the recursive definition of
x + y may be written thus:
x + = p\{x)
x + S(y) = S(pl(x,y,x + y))
1.3. PRIMITIVE RECURSIVE FUNCTIONS ARE CALCULABLE
17
Then f(x,y) = x + y is primitive recursive by R-5, case n — 2, if we take g{x) = p\{x)
and h(x,y,z) = S(pl(x,y, z)). Here g is primitive recursive by R-3, and h is primitive
recursive by R-5, with n = 3 and k — 1, since 5 is primitive recursive by R-l and P33 is
primitive recursive by R-3.
Similarly one shows that x * y is primitive recursive. Now consider x\, this satisfies
the recursive definition
0! = 1
S(x)\ = S(x)*x\
Then f(x) = x\ is primitive recursive by R-5, case n — 1, if we take c = 1, and
h(y, z) = S(y) * p\(z) = S{{p\{y, z)) * p\{y, z). Here h is primitive recursive by R-4, with
n — 1, and k = 2, since multiplication is primitive recursive, by the above observation,
and S and pi are primitive recursive by R-l and R-3 respectively.
1.3.2 THEOREM Every primitive recursive function is calculable.
Proof We must show that the class of calculable functions contains all numerical
functions of types R-l, R-2, and R-3, and that it is closed under schemes R-4 and R-5.
For concreteness, we take n = 3 and k = 2 in R-2, R-3, and R-4, and n=2 in R-5.
The proof then consists in inspecting the following five programs.
R-l. Program for y = S(x):
start
copy X into Y
Y +
v
stop
R-2. Program for w = f(x,y, z) = 2:
start
W +
1
w +
V
stop
18
CHAPTER 1. CALCULATIONS ON AN ABACUS
R-3. Program for w = f(x,y, z) = y:
start
L
copy Y int
I
stop
R-4. Program for w = g(hi(x, y, z), hz(x, y, z))\
start
1
= hi(x,y,
calculate u =
hi(x,y,z
)
1
calculate v =
h 2 (x,y,z
)
1
calculate w
= g{u,v)
empty U
I
empty V
I
stop
R-5. Program to calculate z = f(x,y), when
f(x,0) = g(x)
f(x,S(y)) = h(x,y,f(x,y))
1.3. PRIMITIVE RECURSIVE FUNCTIONS ARE CALCULABLE
19
start
calculate z = h(x, y, u)
For example, take f(x,y) = x + y. Then we have seen that g(x) = x and h(x,y,u) =
S(u). Hence the above program specializes to that given for x + y in Section 1.2. The
same goes for x * y and xy.
20
CHAPTER 1. CALCULATIONS ON AN ABACUS
Exercise
1. What function in one or two variables is calculated by the following program? Explain
why this is so.
start
calculate z = u + 2y + 1
1.4 Some primitive recursive functions
The following list of primitive recursive functions is taken from the book by Kleene. We
have already met (1) to (4). (5) deals with the "predecessor function", the predecessor of
being defined exceptionally as zero. (6) deals with the "naive difference" between x and
y, which is considered to be when y > x. (7) and (8) deal with the usual maximum and
minimum of a pair of numbers. (9) is called the "delta function", (10) the "sign function",
and (11) the "absolute difference".
f x + = x
\x + S(y) = S(x + y)
( x*0 =
\ x * S(y) = (x * y) + x
(x° = 1
I x s ^ = x y * X
f 0! = 1
\S(x)\ =x\*S(x)
(1) x + y
(2) x * y
(3) xy
(4) x\
1.4. SOME PRIMITIVE RECURSIVE FUNCTIONS
21
(6) x - y
(5) pre(a;)
x — 1 if x >
if x =
rr — y if rr > y
if x < y
pre(O) =
pre(S'(x)) = x
x — = x
x - S(y) = pre(x - y
(7) m±n(x,y) = x - (x - y)
(7') min(x, y, z) — min(min(:c, y), z)
(8) max(x,y) = (x + y) - min(x, y)
(8') max(a;, y, z) = max(max(x, y), z)
x + (y - x)
(9) S(x) = {J
{
5(0) =
5(S(x)) =
Alternative definitions: S(x) — 1 — x —
(10) sgn(x) = { J
f sgn(0) =
I sgn(5(x)) = 1
Alternative definitions: sgn(x) = = min(a;, 1)
(11) \x-y\ = (x - y) + (y - x)
The first three functions above suggest a whole sequence of functions f t (x,y), t = 0,
1, 2, 3, . . ., where f (x,y) = x + y, f x (x,y) = x * y, f 2 (x,y) = x y , and f 3 (x,y) = x t y,
which made an appearance in one of the exercises. We are led to define fs(t){x,y) thus:
It is easily seen, by mathematical induction on t, that each of the functions f t (x,y) of
two variables is primitive recursive.
Let us now consider a function of three variables defined as follows:
It is clear that this function is defined recursively, in some sense of the word "recursive" .
In fact it can be shown that it is calculable on our abacus. Yet, it is not primitive recursive,
as was shown by Ackermann. Putting g{x) = f(x, x, x), he proved that, for every primitive
recursive function h(x), there exists a natural number x such that g(x) > h(x). Thus g(x)
cannot be primitive recursive, hence f(x; y, z) cannot be primitive recursive. (Actually,
Ackermann considered not this function, but a very similar one.)
fs(t)(x,0) = sgn(t)
fs(t)(x,S(y)) = f t (xJ S (t)(x,y))
f(t,x,y) = ft(x,y)
In other words,
' f(0,x,y) =x + y
< f(S(t),x,0) = S gn(t)
J(S(t),x,S(y)) = f(t,x,f(S(t),x,y))
22
CHAPTER 1. CALCULATIONS ON AN ABACUS
Exercise
1. Show how to calculate functions (5) to (11) above, by drawing flow diagrams of
programs of the abacus. Make use of subroutines freely.
1.5 The minimization scheme
The reader will recall the substitution scheme R-4 and the recursive scheme R-5, which
played a role in the definition of the class of primitive recursive functions. Another useful
scheme is the following minimization scheme:
R-6. f(xi, . . . , x n ) = smallest y such that g(x±, . . . , x n , y) = 0.
Given a numerical function g of n + 1 variables, this defines a new numerical function
/ of n variables provided the following condition is satisfied:
(*)
Vxi . . . Vx n 3y, g(x u ...,x n ,y) =
(Here "V" means "for all" and "3" means "there exists".)
If condition (*) is not satisfied then / is not a numerical function, but only a partial
numerical function; its domain is a proper subset of N x . . . x N, the set of all n-tuples
of natural numbers.
1.5.1 Proposition In the minimization scheme, if g is calculable so is f.
Proof To make things concrete, let us consider the special case n — 1. Then R-6
becomes:
f(x) = smallest y such that g(x,y) =
and the condition reads
(*) Vrr3y, g(x,y) =
Now consider the following program:
start
X Y
calculate z = g(x,y)
stop
x
x g(x,0)^0
x 1 g(x,l)^0
x y g(x,y) =
1.5. THE MINIMIZATION SCHEME
23
We note that when g(x,y) = for the first time, then y = f(x); hence the program
does calculate the function / at x. Suppose however that there does not exist a natural
number y such that g(x,y) = 0, then the above calculation does not terminate, and / is
not defined at x. We observe that Z is a temporary storage location.
1.5.2 Example Take g(x, y) = x — y. Then f(x) = x. Thus / is the identity function,
a special case of R-3. This example illustrates that a function defined by means of scheme
R-6 may very well be definable without using scheme R-6.
1.5.3 Example We recall that, for any real number £, [£] denotes the greatest integer
not exceeding £. Thus [£] = n if and only ifn<£<n + l. [£] may also be regarded as
the smallest integer n such that n + 1 > £.
Let us calculate the function y — [x/2] with the help of the minimization scheme.
Indeed,
[x/2] = greatest y such that 2y < x
= smallest y such that 2(y + 1) > x
= smallest y such that 2(y + 1) > x + 1
= smallest y such that (x + 1) — 2(y + 1) = 0.
Clearly, condition (*) is satisfied in this case. A program for calculating y — [x/2] is thus
given by the following flow diagram, where Z is a temporary storage location:
start
I
1.5.4 Example More generally we may try to calculate
[x/y] = greatest z such that * y < x
= smallest ( such that z + 1) * y > x
= smallest ( such that z + l)*y>x + l
= smallest ( such that x + l)(z+ l)y =
24
CHAPTER 1. CALCULATIONS ON AN ABACUS
The program will be the following, using W as a temporary storage location:
start
I
Of course, [x/0] is undefined. In fact, it is easily seen that a calculation with this program
does not terminate when y — 0.
Exercises
1. Using the minimization scheme, show how to calculate y = [y/x] and y = [logio(x + 1)].
Draw the flow diagrams.
2. What function is calculated by the following program? Explain how you got your
answer.
start
1.6 Recursive functions and relations
We shall make a definition.
1.6.1 Definition The class of recursive functions is the smallest class of numerical
functions which contains all functions of types R-l, R-2, and R-3, and which is closed
under schemes R~4, R-5, and R-6, where R-6 satisfies condition (*).
1.6. RECURSIVE FUNCTIONS AND RELATIONS
25
1.6.2 Theorem Every recursive function is calculable.
Proof This is an immediate consequence of Theorem 1.3.2, which asserts that every
primitive recursive function is calculable, and of Proposition 1.5.1 which asserts that the
class of calculable functions is closed under the minimization scheme. ■
1.6.3 Definition The class of partial recursive functions is the smallest class of
partial numerical functions which contains all functions of types R-l, R-2 and R-3, and
which is closed under schemes R~4, R-5, and R-6, where R-6 need not satisfy condition
(*).
For example, z — [x/y] is only a partial recursive function, as it is undefined when
y — 0. On the other hand, z — [x/(y+ 1)] is always defined hence it is a recursive function.
We can artificially introduce a function.
, = f when y =
I [x/y] when y >
Then
[x/y]' = smallest z such that y = or (z + 1) * y > x
It will become clear later that this equation can be written in the form
[x/y]' = smallest z such that g(x,y,z) =
where g is a suitable recursive function. Thus z = [x/y]' will be a recursive function.
(Actually, it is even a primitive recursive function, as is seen by looking at the book by
Kleene.)
We write u R(x,y)" to say that x bears the relation R to y. u R(x,y)" is a statement
capable of being true or false, and we might think of R as a function from N x N to {T, F},
the set of truth values. We regard R as a binary relation, and this is a special case of an
n-ary relation, when n — 2. Unary relations, when n = 1, are also called properties.
If R and S are relations, binary for the sake of definiteness, we define new relations
R A S, i? V S and ->R by saying that R A S (read R and S) holds at (x, y) if and only if
both R(x,y) and S(x,y), it! V S (read R or S) holds at (x,y) if and only if at least one
of R(x, y) or S(x, y) does and ->R (read not R) holds at (x,y) if and only if R(x, y) fails.
We also say that R A S is the conjunction of R and S, that R V S is the disjunction
of R and S and that ->R is the negation of R.
We shall now define what we mean by a relation or property being recursive. For
concreteness we restrict attention to the cases n = 2 and n — 1.
26
CHAPTER 1. CALCULATIONS ON AN ABACUS
1.6.4 Definition A binary relation R (property P) is said to be recursive if there
is a recursive function g such that R(x,y) if and only if g(x,y) = (P{x) if and only if
g(x) = 0) for all natural numbers x and y.
For example, > is a recursive relation, since x > y if and only if y — x = 0, and equality
is a recursive relation, since x — y if and only if \x — y\ — 0. The property of being even
is recursive, since x is even if and only if [x/2] * 2 = x, that is, \[x/2] * 2 — x\ =0.
The usefulness of this definition arises from the fact that one can apply the minimiza-
tion scheme to a recursive relation to produce a recursive function:
f(x) = smallest y such that R(x,y)
= smallest y such that g(x,y) =
The condition (*) of Section 1.5 then becomes:
Vx3yR(x, y)
1.6.5 PROPOSITION The class of recursive relations (properties) is closed under con-
junction, disjunction, and negation.
Proof Suppose
R(x,y) if and only if g(x,y) =
and
S(x, y) if and only if h(x, y) —
where g and h are recursive functions. Then
R(x,y) A S(x,y) if and only if g(x,y) + h(x,y) =
R(x, y) V S(x, y) if and only if g(x, y) * h(x, y) —
and
-*R(x, y) if and only if 1 — g(x, y) —
Since +, * and — are all recursive functions, it follows that g(x,y) + h(x,y), g(x,y) *
h(x,y) and 1 — g(x,y) are all recursive. ■
It is now clear that the function z = [x/y]' defined earlier is recursive. In fact
[x/y]' = smallest z such that y = 0or(z + l)*y>x + l
= smallest z such that y = or (x + 1) — (z + 1) * y =
= smallest z such that y * ((x + 1) — (z + 1) * y) — 0.
1.7. HOW TO CALCULATE THE NTH PRIME
27
Exercise
Show that the following properties and relations are recursive, in each case producing a
function g as in Definition 3.
1. x is odd.
2. a; is a perfect square, that is, x = [y/x] 2 .
3. x < y.
4. x^y.
5. x < y < z (ternary relation).
1.7 How to calculate the nth prime
One saxs that y divides x and writes y\x when 3z, (y * z = x). When y ^ 0, this is
equivalent to saying that y * [x/y] = x. On the other hand, 0\x only when x = 0. Thus,
in any case, y\x if and only if y * [x/y] 1 = x if and only if x — (y * [x/y]') = 0. Therefore
the relation "divides" is recursive.
Let us now look at the property of being prime. A natural number x is called prime,
and we write prim(x), provided x > 1 and x has no divisor y > 1 other than x. In other
words, prim(x) if and only if
x > 1 and x = smallest y such that y > 1 and y\x
Unfortunately, when x < 1, there is no y such that y > 1 and y\x, hence the minimization
scheme which appears in the above definition of prim(x) defines only a partial function.
To make this into a function we write
prim(x) if and only if x > 1 A x = (f)(x)
where
4>(x) = smallest y such that x < 1 V (?/ > 1 A y\x)
= smallest y such that g(x,y) =
where
g(x, y) = (x - 1) * ((2 - y) + (x - y * [x/y]'))
Once this has been checked, it follows that
prim(x) if and only if (2 — x = 0) A (x — (f>(x)) =
if and only if ip(x) =
where
tp(x) = (2 - x) + (x - (j)(x))
28
CHAPTER 1. CALCULATIONS ON AN ABACUS
The two functions <ft(x) and ip(x) are of no importance except that they serve to show
that prim is a recursive property.
The reader will be familiar with the sequence of prime numbers
We wish to put 2 = p(0), 3 = p(l), etc. We thus write p(x) for the (x + l)st prime. The
function y = p(x) may be defined recursively thus:
We must check that Vu 3y, x(u, y) = 0. This amounts to showing that for each natural
number x there exists a prime number y which is greater than p(x). This fact was shown
by Euclid, who observed that
is not divisible by the first x + 1 prime numbers p(0), p{l), ■ ■ ., p{x) (since it leaves a
remainder of 1 when divided by any of them). Knowing that it must be divisible by some
prime number y (every integer > 1 is) he deduced that y > p(x).
Therefore y = p(x) is a recursive function. Since all recursive functions are calculable,
it follows that there is a program to calculate y = p(x) on the abacus. If the reader wishes
to take the trouble, he could draw a flow diagram for this program, making use of known
subroutines.
We can also calculate the function
2,3,5,7,11,...
q(x,u) = smallest y such that x(. u iD) —
where
X(u,y) = ip(y) + (u + 1) - y
1 + p(0) * p{l) * ■ ■ ■ * p(x)
ir(x) = number of primes < x
We note that this has the following recursive definition:
ir{x) + S(tp(x))
1 if if){x) — 0, that is, x is a prime
otherwise
1.7. HOW TO CALCULATE THE NTH PRIME
29
One may also define n(x) in terms of p(x) by noting that
p(x) = smallest y such that x < ir(y + 1)
For technical reasons we shall later require a certain function in two variables exp(x, y),
but we prefer to write exp x (y). By this we mean the exponent of x in y, that is, the
greatest z for which x z divides y. For example, exp 5 (100) = 2, since 5 2 divides 100 but
5 3 does not. In fact, we may put
exp x (y) = smallest z such that ~^(x z+1 \y)
utilizing the minimization scheme, at least when x ^ 1 and y ^ 0. The only trouble is
that Condition (*) of Section 1.5 may be violated: x z+1 may divide y for all z. This will
indeed be the case when x = 1 or y = 0. We arbitrarily put exp x (y) = in these two
exceptional cases. We now end up with the following definition:
exp x (y) = smallest z such that i = 1Vj/ = 0V ~^(x z+l \y)
Exercises
1. Draw a flow diagram for a program which will calculate the function ip(x) of the text,
which has the property that x is a prime if and only if ip(x) = 0.
2. Using the above exercise as a subroutine, exhibit the flow diagram of a program which
will calculate ir(x), the number of primes less than x.
3. Using either of the above, draw the flow diagram of a program for calculating p(x),
the (x + l)st prime.
CHAPTER 1. CALCULATIONS ON AN ABACUS
Chapter 2
More about calculability
2.1 What is a program?
In Section 1.1 we described programs for an abacus informally with the help of flow
diagrams. We now wish to give a precise definition. Let us begin by looking at an
example. We recall the program "transfer content of X to Y" :
start
Let us assign numerical labels to the nodes of this flow diagram (which correspond
to the instructions of the program) other than those already labelled "start" and "stop" .
start
X-(node 1)
F + (node 2) stop
The program can now be represented by the following table:
start
1
2
yes
1
2
1
no
1
stop
1
location
X
Y
action
minus
plus
31
32
CHAPTER 2. MORE ABOUT CALCULABILITY
This table tells us the following: At node 1, carry out the operation X~. Can it
be done? Yes, then go to node 2; no, then go to node stop. At node 2, carry out the
operation Y + . Can it be done? Yes, then go to node 1; no, then also go to node I. (Of
course the negative alternative cannot really occur.) At node start do nothing to any
location and go to node 1 in any case.
Suppose, for example, we are given the following table:
start
1
2
3
yes
2
stop
3
2
no
2
stop
3
1
location
A
B
C
action
minus
plus
minus
It is clear that this can only correspond to the following flow diagram, probably repre-
senting a pretty stupid program:
start
stop
Let us write "a" for "yes", u (3" for "no", "7" for "location" and "e" for "action".
Then a, (3, 7 and e are clearly functions. To describe the source and target of each of
these functions, we let L be the set of locations, and we assume that there is given a finite
set N of nodes,
N = {start, 1, 2, . . . , k, stop}
Then
a
P
7
e
N - {stop} - {start}
N - {stop} - {start}
N — {stop, start} > L
N — {stop, start} > {plus, minus}.
(Here N — {stop} is short for {start, 1,2,..., k}, etc.)
A program may now be defined as a quadruple (a, (3, 7, e) subject to the restriction
that a(n) = f3(n) if n = start or if e(n) = plus.
2.2. CALCULABLE FUNCTIONS ARE RECURSIVE
33
Exercises
1. Make up a table for the program "copy A into B" .
2. Draw the flow diagram corresponding to the following table:
start
1
2
3
4
yes
1
2
stop
4
3
no
1
2
3
4
stop
location
A
B
C
B
action
plus
minus
plus
minus
2.2 Calculable functions are recursive
Let there be given a program (a, (3, 7, e) for the abacus. We have talked about the finite
set of nodes
N = {start, 1,2, ... ,k, stop}
and the infinite set of locations
L = {A,B,C,...}
For our present purpose it will be more convenient if both nodes and locations are denoted
by numbers, thus we let
N = {0,1,2,. ..,k,k + 1}
£ = {1,2,3,...}
A stage of a calculation is described by the node n and the configuration (rii,n 2 , . . .)
where rii is the number of pebbles at the location i. It is understood that all but a finite
number of locations are empty, so that all but a finite number of the rii are zero. With
each stage we associate its so-called Godel number
s = 2 n 3 ni 5 n2 •••
where all but a finite number of factors are equal to 1.
Given any positive integer s, by factoring it into prime powers, we can read off the
stage whose Godel number it is (assuming the exponent of 2 does not exceed k + 1). Note
that the factorization of a positive integer into a prime powers is unique.
What does our program (a, (3, 7, e) do to a stage of a calculation? It determines the
next stage; in particular,
C-l. If n — 0, it changes node n to node a(n).
34
CHAPTER 2. MORE ABOUT CALCULABILITY
C-2. If < n < k and e(n) = plus, it adds one pebble to location 7(71) and changes node
n to node a(n).
C-3. If < ra < fc, e(n) = minus and 7(71) is nonempty, it takes one pebble from location
7(71) and changes node n to node a(n).
C-4. If < n < k, e(n) = minus and g(n) is empty, it leaves the configuration unchanged,
but changes node n to node f3(n).
C-5. If n = /c + 1, there is no next stage.
Assuming that n < k, so that there is a next stage, let its Godel number be G(s).
Then putting n = exp2(s), we obtain
r s * 2 a (°) if n =
ni \ — J s * 2 a (™^ _n * ^(7(71)) if 7i > and e(n) = plus
I s * 2 a( ") _n /p(^(n)) if 7i > 0, e(n) = minus and p(7(n))|s
[ s * 2 /3 ( n ) _n if 7i > 0, e(n) = minus and -1(^(7(71)) |s)
So far, G(s) is undefined when n > k + 1 and also when s = 0. Let us arbitrarily put
G(s) = 0ifs = 0orn>A; + l
With a little bit of extra work, one can rewrite the above as follows:
G(s)
( 9l (s) if/i(s) =
^ 2 (s) if/ 2 (s) =
(? 3 (s) if/ 3 (s) =
g 4 (s) if/ 4 (s) =
. if s = or exp 2 (s) > A; + 1
where ^1, ^2, 93, 9i and /1, /2, ^3, fi are recursive functions, for example, <?i(s) = s * 2 a ^°)
and = exp 2 (s). The others are a little harder. Now for each s, exactly one the five
cases holds, hence
G(s) = g 1 (s)6(f 1 (s)) + ■ ■ ■ + g 4 (s)6(f 4 (s))
and so G(s) is seen to be a recursive function.
Iterating the function G(s), we obtain the function G t (s) which may be regarded as a
function of two variables and is defined recursively thus:
/ G°(s) = s
\G s W(s) = G(G t (s))
2.3. DIGRESSION: ENUMERATIONS
35
What do we mean by saying that the program (a, (3, 7, e) calculates the function y =
f(x)7 Take X = 1 and Y — 2, then the initial and final configurations are given by
1 2
~x
x f(x)
and hence the Godel numbers of the initial and final stages are
s = 2°3 X 5°, G m(x) (s) = 2 k+1 3 x 5 fix)
where
m(x) = smallest t such that exp 2 G t (3 x ) = k + 1
Therefore
f(x) = exp 5 G^ x \3 x )
Since f(x) is constructed from known recursive functions with the help of the minimization
and substitution schemes, it follows that f(x) is recursive. We have thus proved the
following:
2.2.1 THEOREM Every calculable numerical function is recursive.
As we have already established the converse of this in Section 1.6, we conclude that
"calculable" and "recursive" are synonymous terms.
Are there any numerical functions which are not calculable or recursive? We shall
answer this question after a short digression.
Exercises
1. A calculation has arrived at node 5 of the flow diagram of a certain program. There
are 3 pebbles at A, 1 pebble at C and no pebbles at any other location. What is the
Godel number of this stage of the calculation?
2. Assuming that a certain program contains enough instructions (nodes), describe the
stage of a calculation corresponding to the Godel number 90,000?
2.3 Digression: enumerations
A set X is said to be countable (or enumerable) if it may be put into one-to-one
correspondence with the set of natural numbers N, that is, if there is a function / :
N > X which is one-to-one and onto.
36
CHAPTER 2. MORE ABOUT CALCULABILITY
For example, the following sets are countable.
{0, 1, 2, 3, . . .} = set of natural numbers
{1, 2, 3, 4, . . .} = set of positive integers
{0, 2, 4, 6, . . .} = set of even natural numbers
{0, 1, 4, 9, . . .} = set of perfect squares
{2, 3, 5, 7, . . .} = set of prime numbers
The enumeration function / in these examples is given by
f(n) = n, n + 1, 2n, n 2 , p(n)
all of which are recursive functions. It is easily seen that every infinite subset X of N is
countable, although the function / : N will not in general be recursive, as will be
seen later.
A countable set X need not be a subset of N. For example, the set of integers is
countable:
{0,1,-1,2,-2,...}
A more interesting example is the set N x N of all pairs of natural numbers. This is
enumerated thus:
(0,0),
(0,1), (1,0),
(0,2), (1,1), (2,0),
(0,3), (1,2), (2,1), (3,0),
We first take all pairs with sum 0, then all pairs with sum 1, etc. Now the number of entries
in the first n lines of the above array isl + 2 + 3 + -- -+ n = n(n + l)/2. It is easily checked
that the function / :NxN > N is given by f(x,y) = ((x + y)(x + y + l)/2) + x, clearly
a recursive function in two variables.
As our next example, consider the set of all finite sequences of natural numbers. With
each finite sequence (a , a 1: . . . , a n ) we associate the number 2 a ° +1 3 ai+1 • • • p(n) a " +1 . This
gives a one-to- one correspondence between the set of finite sequences of natural numbers
and a subset of N.
An example of a set which is not countable is the set of all properties of natural
numbers. Suppose this set can be enumerated, then all properties of natural numbers are
contained in the sequence P , P±, P 2 , Let us now defined a new property P as follows:
P(x) if and only if ->P x {x). We don't care which natural numbers have property P, but
we do know that P is not in the above sequence. For suppose P = Pk, then
P(k) if and only if P k (k)
if and only if ~>P(k)
2.3. DIGRESSION: ENUMERATIONS
37
clearly a contradiction. It follows that the set of all properties of natural numbers cannot
be enumerated.
The above argument is due to Cantor, and it is known as Cantor's diagonal argument.
Another application of this argument is to the set of all functions N > N. Suppose this
set can be enumerated thus: / , /i, / 2 , • • •• Let us now define a function / : N > N
by putting f(x) = f x (x) + 1. Suppose this function was a member of the sequence, then
f — fk, for some fceN. Hence
/*(a;) = fx(x) + 1
for all x e N. Taking x = k, we get the contradiction
/*(*) = /*(*) + 1
Therefore the set of functions N > N cannot be enumerated.
A third application of Cantor's diagonal argument deals with the set of all real numbers
x such that < x < 1. Suppose this set can be enumerated thus:
ai = 0.a u a 12 a 13 ■ ■ ■
a 2 = 0.a 2 ia 22 a23 • • •
a 3 = 0.a 3 ia 3 2a33 • • •
where the a^- are digits from to 9. It will be convenient to assume that none of these
decimal expansions involve an infinite sequence of nines, since every such expansion can
be replaced by another one with an infinite sequence of zeros. (For example, 0.1999 • • • =
0. 2000 • • •. Let us now construct a number
b = O.&1&2&3 • • •
where b\ is neither a u nor 9, 6 2 is neither 022 nor 9, and so on. Then b 7^ a x since they
differ in the first decimal place, b 7^ a 2 since they differ in the second decimal place, b 7^ 03
since they differ in the third decimal place and so on. But < b < 1 so we must have
b — a n for some n, a contradiction. It follows that the set of real numbers in the unit
interval cannot be enumerated.
We shall meet yet another application of Cantor's diagonal argument in the next
section.
Exercises
1. Explain why each of the following sets is countable:
a. the set of odd natural numbers,
38
CHAPTER 2. MORE ABOUT CALCULABILITY
b. the set of cubes of natural numbers,
c. the set of odd integers,
d. the set of rational numbers,
e. the set of 3-tuples of natural numbers.
2. Explain why the following sets are not countable:
a. the set of real numbers,
b. the set of complex numbers,
c. the set of irrational numbers.
2.4 Example of a noncalculable function
Every calculable numerical function N >■ N can be calculated with the help of some
program (a, (3, 7, e). The program may be described explicitly as an array of numbers;
provided we encode plus and minus as numbers, for example, as 1 and respectively.
1 •
• k
k + 1
a{\) ■
■ a(k)
a(k + 1)
m-
■P(k)
P(k + 1)
7(1) ■
■l(k)
l(k + l)
6(1) ■
■ <k)
e(k + l)
It is clear that these arrays can be enumerated, hence the set of programs is countable.
Thus the set of calculable functions N > N is countable. On the other hand, the set
of all numerical functions N > N is not countable, as we saw in the previous chap-
ter. Therefore not all numerical functions are calculable. We shall actually construct a
numerical function which is not calculable.
Let P , Pi, P 2 , . . . be an enumeration of all programs. Suppose an initial configuration
of x pebbles at location 1 and pebbles at location 2 is transformed by program P n to a
final configuration of x' pebbles at location 1 and y pebbles at location 2 1 . Then we shall
write f n (x) = y and say the program P n calculates f n at x. If the calculation does not
terminate, we shall say that f n (x) is undefined. Thus f n (x) is a partial recursive function
in x. (Note that in the present context we do not insist that x' = x as in an earlier
chapter. However, it is easily seen that some slightly modified program would calculate
f n (x) in the earlier sense.) Writing f(x,n) = f n (x), we obtain a partial recursive function
in two variables.
Let us say that n has property G if the above calculation terminates for each x, that
is, if f n (x) is a function, not just a partial function. Let g(y) be a numerical function
We do not caxe what happens at other locations.
2.4. EXAMPLE OF A NONCALCULABLE FUNCTION
39
which enumerates all numbers with property G. For example, we may write
( g(0) = smallest n such that G(n)
\ g(S(y)) = smallest n such that n > g(y) A G(n)
The right hand side of this last equation is well defined, since there are evidently
infinitely many natural numbers with property G, as there are infinitely many calculable
functions. Thus g(y) is a numerical function which enumerates all natural numbers with
property G.
While f(x,n) is only a partial function,
h(x,y) = f(x,g(y)) = f g ( y )(x)
is a genuine function of two variables, since g(y) belongs to the set G. From this we obtain
the function h(x, x) of one variable by diagonalizing, that is, by putting y = x. We claim
that the function h(x, x) is not calculable.
Indeed, suppose it is calculable, then so is h(x,x) + 1. Let P m be a program which
calculates the latter function. Clearly, m has property G, hence we may write m = g(k),
for some natural number k. Therefore
h(x, x) + l = f m {x) = f(x, m) = f(x, g(k)) = h(x, k)
Now, this is true for all x G N. Taking x = k, we get
h(k,k) + 1 = h(k,k)
a contradiction.
We have thus exhibited a function h(x, x) which is not calculable on an abacus. We
may squeeze a little more out of our argument.
It can be shown that f(x,n) is a partial recursive function. The proof of this fact is
not difficult, but a little tedious, and we shall skip it. Let us assume this result. We claim
it then follows that g(y) is a not a recursive function.
Indeed, suppose g(y) is recursive. Then so is h(x,y) = f(x,g(y)), hence also h(x,x).
But we know that h(x, x) is not calculable; in other words, it is not recursive. Thus g(y) is
not recursive. Since g(y) was any numerical function which enumerates all natural num-
bers with property G, it follows that no recursive function can enumerate these numbers.
We shall return to this statement in Section 2.6.
The program for calculating f(x, y) may be used as a super-program for calculating
all partial recursive functions in one variable, since every such partial function has the
form f n {x) = f(x,n) for some n.
40
CHAPTER 2. MORE ABOUT CALCULABILITY
2.5 Turing programs
We have seen how to perform calculations on an abacus consisting of an arbitrary number
of locations, each location being capable of storing arbitrarily many pebbles. One can
overcome one's misgivings about an infinite number of locations by reassuring oneself that
it is always possible to dig another hole in the ground when needed. But holes of infinite
depth are less easy to accept.
Let us see what happens if we restrict the storage capacity of each location to be
at most m pebbles. One has in mind the classical abacus with m — 9, to facilitate
representations of numbers in the scale of 10. But we may as well go all the way and
take m = 1. Thus each location can hold at most one pebble. Let us call this the binary
abacus, because each location must be in one of two states: empty or full.
I
The elementary instruction X + of the original abacus makes no sense for the binary
abacus, as one cannot put a pebble into a full location. Instead we shall adopt a new
I
elementary instruction X + . If X is empty, we are asked to put a pebble into location
X and proceed along the left arrow. If X is full, we are asked to leave X alone and
proceed along the right arrow. We may now enquire whether every calculation that can
be performed on the original abacus can also be carried out on the binary abacus. Instead
of arranging the locations in a single sequence
A x A 2 A 3 ■■■
we may put them in double array as follows:
Ais
A 23
A,
A 12
A 22
A
An
A 2 i
A :
Instead of having n pebbles at location A\ we may have one pebble each at locations An,
A\2i ■ • A\ n . (It is convenient to represent the number n in this way, and not in the
scale of 2.) The old instruction A~l may now be translated into a program for putting
one pebble into the first empty location in the column An, A 12 , .... What does such a
2.5. TURING PROGRAMS
41
program look like?
start
4i
stop
A +
12
stop
Unfortunately, this will be an infinite flow diagram, unless we change our method of
describing programs. Indeed, the following finite flow diagram will explain what we have
in mind:
start
stop
up
stop
Here "up" means "one location higher".
Turing's idea is even more radical. He arranges the locations in one line extending
infinitely in both directions. Thus the locations are in a natural one-to-one correspondence
with the integers:
...,A_ 2 ,A_ 1 ,A 1 ,A 2 ,...
Moreover he never allows you to move from one location to another, unless the new
location is adjacent to or coincides with the old location.
A Turing program is built up from the following elementary instructions:
start X
+right +left
X X / \
III
+ stay "right "left "stay
xxxx/\xx
stop
The first instruction says: start at location X. The second instruction says: add one
pebble to the location at which you are, if it doesn't have one already, then move to the
location on the right. If this new location is full, go to the node indicated by the left
arrow. If the new location is empty, go to the node indicated by the right arrow.
The other instructions are interpreted similarly.
42
CHAPTER 2. MORE ABOUT CALCULABILITY
One now represents the number x on the binary abacus by x + 1 pebbles at consecutive
locations. (Thus zero is represented by one pebble.) One says that a function y = f(x) is
computed by a Turing program, if the calculation set to start with a representation of
x (the last pebble being at the starting location) ends with a representation of x followed
by an empty location followed by a representation of f(x) (the last pebble being at the
terminal location).
It can be shown that a numerical function can be computed by a Turing program if
and only if it is calculable, that is, recursive.
Actually, Turing did not talk about programs but about machines. We shall give a
brief description of these.
A Turing machine consists of a central unit which is scanning one square of an
infinite tape. The central unit is in one of a finite number of states:
So = start, Si, S 2 , ■ ■ ■ , S m
Each square of the tape has written on it a word of the finite vocabulary E:
ai,...,a n
or = blank. For example, we may have the following situation
central unit
reads and writes
cr|q |cr| t |cr |cr 1 1 1 1 tape
The scanned square may be read and written upon. The tape is capable of moving in
either direction. The machine is capable of making the following kinds of moves:
(1) ( ff>5 .)_( T>1 s.)
erase a, print r, change state from Si to Sj.
(2) (a, 50 — (right, S s )
move tape one square to the right (or move scanner one square to the left), change state
from Si to Sj.
(3) (a, Si) -(left, ,5,)
move tape to left and change state.
We assume the machine is deterministic, that is, each move is uniquely determined
by (a, Si), a e E.
2.6. EFFECTIVE PROCEDURES
43
Without going into technical details we shall say this: if the machine is presented with
a string of words on the tape when the central unit is in the state start and ultimately
comes to a stop, then the string is said to have been accepted by the machine. One
may ask, what sets of strings consist precisely of the acceptable strings of some Turing
machine? Not surprisingly, the answer is: the recursively enumerable sets discussed in
Section 2.6 below.
To compare Turing programs with Turing machines, we take £ = {0, 1} and adopt the
following dictionary:
location = square on two-way infinite tape
pebble = printing of symbol "1"
program = machine
node = internal state
location at one stage of calculation = scanned square
The proof that a function is computable on a Turing machine if and only if it is
recursive may be found in the book by Kleene. Alternatively, one may prove directly that
a function is computable on a Turing machine if and only if it is calculable on the abacus;
such a proof appears in the recent book by Boolos and Jeffrey.
2.6 Effective procedures
Mathematicians believe that a numerical function which can be effectively calculated or
computed by any means whatsoever is calculable or computable in the technical sense of
these words, hence recursive. This statement, known as the Church- Turing Thesis cannot
be proved as we do not know what is meant by "effectively computable by any means
whatsoever" .
Given a set P of natural numbers, we may ask this: Is there an effective procedure
for deciding whether any natural number n is an element of PI As a consequence of the
Church- Turing Thesis we assert that there is such an effective procedure if and only if
P is a recursive set, that is, there exists a calculable function / : N > N such that
VxeN, x G P if and only if f(x) = 0.
A related concept is embodied in the following.
2.6.1 Definition A set P of natural numbers is recursively enumerable if it is the
image of a recursive function, that is, if there is a recursive function f : N > N such
that P = {/(0), f(l), /(2), . . .}, with repetitions permitted.
44
CHAPTER 2. MORE ABOUT CALCULABILITY
In particular, every finite set of natural numbers is recursively enumerable. For exam-
ple, the set {a, b, c, d} is the image of the function
=
= 1
= 2
> 3
This function is recursive, since
f(x) = a* S(x) + b* S(\x - 1|) + c * S(\x - 2|) + d * 5(3 ^ x)
2.6.2 Theorem A set P of natural numbers is recursive if and only if P and its
complement N — P are each recursively enumerable or finite.
Proof Assume that P is recursive. If P is infinite, we may define a function g as follows:
j gifS) = smallest y such that y G P
\ g(S(x)) = smallest y such that y G P A y > g(x)
Clearly P = {<?(0), <?(1), . . •}, and so g affords a recursive enumeration of P.
If N — P is infinite, we may similarly define a function h which will recursively enu-
merate N — P:
( h(0) = smallest y such that y G P
\ h(S(x)) = smallest y such that y G P A y > h(x)
Conversely, let us assume first that P and N — P are both recursively enumerable, say g
enumerates P and h enumerates N — P. Since every natural number x is an element of
P or of N — P, there exists a number y such that x = g(y) or i = Put
e(x) = smallest y such that x = g(y) Vx = fo(y)
Then x G P if and only if c/(e(a;)) = x, that is, /(x) = 0, where f(x) = \g(e(x)) — x\.
Next, let us assume that P is recursively enumerable and N — P is finite, say N — P =
{&!, b 2 , ■ ■ ■ , b m }. Then we define g as above and put
e(x) = smallest y such that x = g(x) V x = &i V rr = 62 V . . .V x = b m
Then we proceed as above. There remains the case when N — P is recursively enumerable
and P is finite, and this case is treated quite similarly.
To illustrate the above notions, recall from Section 2.4 that a natural number n has
property G if the calculation of f n (x) by the program P n terminates. We may as well
regard G as a set, then what we showed is that no recursive function can enumerate G.
/(*) =
a
if
X
b
if
X
c
if
X
d
if
X
2.6. EFFECTIVE PROCEDURES
45
In the present terminology, this means that G is not recursively enumerable. It follows
from the above theorem that G is also not recursive.
In many applications of these ideas we are interested not in natural numbers, but in
strings which are made up of symbols of a finite alphabet or words of a finite vocabulary.
By putting these strings in alphabetic or lexicographic order, we obtain a one-to-one
correspondence between strings and natural numbers. We may then speak of recursive
and recursively enumerable sets of strings.
For example, let us consider the finite vocabulary {(, ), p, q, r, ', —,=>•, V , A,
which will allow us to deal with the propositional calculus, the propositional variables
being p, q, r, p', q', r', p" , etc.
It is easily seen which strings are formulas (e.g. (p =>- q) A r)) and which are not (for
example pq =>- (')). Thus the set of formulas is recursive. So is the of tautologies, because
the method of truth-tables allows us to decide effectively which formulas are tautologies.
To get a language adequate for all mathematics, we adjoin to the above vocabulary
the set {x, y, z, V, 3, =, e}. Again, the set of formulas is recursive. However, it can
be shown that the set of theorems is not recursive, that is, there is no effective method
for deciding whether a formula is a theorem. This is just as well, or mathematicians
could be replaced by computers. Still, the set of theorems is easily seen to be recursively
enumerable. It then follows from the above theorem that the set of formulas which are
not theorems is not recursively enumerable.
As another example, we may take as our vocabulary the set of all English words, say
all the words listed in the Oxford dictionary. This set is quite large, but still finite. We
are then interested in those strings which are grammatical sentences. Linguists are agreed
that the set of grammatical sentences is recursively enumerable. One might even expect
that this set is recursive.
Exercise
1. State whether each of the following sets is
a. recursive,
b. recursively enumerable:
(i) the set of theorems in mathematics,
(ii) the set of tautologies in the propositional calculus,
(iii) the set of statements in mathematics which are not theorems,
(iv) the set of formulas of the propositional calculus which are not tautologies,
(v) the set of programs for the abacus,
46 CHAPTER 2. MORE ABOUT CALCULABILITY
(vi) the set of programs which calculate a function y — f(x).
Chapter 3
Syntactic types and semigroups
3.1 A simple grammar using types
It is the aim of grammar to describe the set of sentences in a language. There are several
kinds of grammars that have been studied from a mathematical point of view. We shall
first consider the method of syntactic types, also known as "categorial grammar" . We
shall illustrate this by looking at a small fragment of English.
We first define the set of types inductively
T-
-1.
s is a type (s = sentence)
T-
-2.
n is a type (n = name)
T-
-3.
if a and b are types so is a ■ b
T-
-4.
if a and b are types so is a/b (read
"a over b" )
T-
-5.
if a and b are types so is a\b (read
"a under b")
For the moment we shall not accept any types other than those which may be con-
structed from T-l to T-5. For example, the following expressions are types, where paren-
theses are used as ordinarily in mathematics:
s, s/n, (s/n)-n, s/(n-n), n\(s/n)
We introduce a binary relation > between types. It is our intention to interpret
"a >■ 6" to mean that every expression of type a is also of type b. The relation *-
is subject to the following rules, for the time being:
Ar-1. a >a,
47
48
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
Ar-2. if a > b and b > c, then a > c,
Ar-3. if a > b and c > d, then a ■ c > 6 • <i,
Ar-4. (a/b)-b >a,
Ar-5. a-(a\b)
Later we shall consider some additional rules.
We also use the arrow between words of the language studied and types. Thus John
s-n is intended to mean that John has type n. For the sake of illustration we shall
consider a small fragment of English consisting of the words John, Jane, milk, works, rests,
likes, calls, poor, fresh, today, and, while, with, for.
We begin by assigning types to these words. The first three words will be assigned the
type n, as they are names. (Indeed, milk is the name of a substance.) If works is assigned
type x, the sentence John works will have type n ■ x. Rule T-5 suggests we take x = n\s;
then this sentence has type n ■ (n\s) >s. Since John likes Jane is a sentence, likes Jane
should have the same type as works, that is, n\s. Rule T-4 then suggests we let likes
> (n\s)/n, so we have the analysis
John likes Jane
n (n\s)/n n
" v '
n\s
s
Continuing in this manner, we arrive at the following type assignment for our fragment
of English:
John
n
(name)
Jane
n
milk
n
works
n\s
(intransitive verb)
rests
n\s
likes
j»
(n\s)/n
(transitive verb)
calls
n\s, (n\s)/n
poor
n/n
(adjective)
fresh
n/n
today
s/s, s\s
(adverb)
and
(s\s)/s
(conjunction)
while
(s\s)/s
with
(s\s)/n
(preposition)
for
(s\s)/s, (s\s)/n
3.1. A SIMPLE GRAMMAR USING TYPES
49
A number of examples will show that one can now check the sentence-hood of strings
that were not used in arriving at the above type assignments. Having learned a grammar,
one should be able to use it to construct new sentences.
(poor John) (likes (fresh milk))
n/n n (n\s)/n n/n n
„ „ '
n n
" v '
n\s
S v '
s
((John works) (for Jane)) (while (Jane rests))
n n\s (s\s)/n n (s\s)/s n n\s
" v ' s v ' s v '
s s\s s
" v ' s v '
s s\s
S v '
s
Of course, many meaningless sentences also turn out to be grammatical according to our
rules, for example:
poor milk calls fresh John
milk rests for milk while milk works
On the other hand, some English sentences have escaped our net, for example,
Jane calls John fresh
This could be remedied by adding some further type assignments to the above list.
Let us try to adjoin the pronouns he and him to our fragment of English. Since we
cannot say poor him works and Jane likes he, we must not assign the type n to he and
him. Attempting to justify the sentences he works and Jane (likes him), we are led to the
following assignments:
he > s/(n\s)
him > ((n\s) / n)\(n\s)
The type of him seems rather complicated. Had we allowed the constructions (Jane likes)
John and (Jane likes) him, we should have arrived at different types for likes and him,
namely
likes > n\(s/n)
him 5- (s/n)\s
Exercise
1. Using type assignments, check that the following are grammatical sentences:
a. poor poor John works
50
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
b. John works today today
c. he works for Jane today
d. he works, for Jane rests today
3.2 Analysis of the English verb phrase
It does not seem likely that we can push the method of syntactic types very far in appli-
cations to natural languages, as long as we insist that all types be built up from s and n
alone. In this chapter we shall make use of five primitive types:
s = complete declarative sentence
n = name
i = infinitive of intransitive verb
p = present participle of intransitive verb
q = past participle of intransitive verb
Accordingly we adopt the type assignments:
work > i
working >■ p
worked =- q
Let us now look at the sentence
John (must work)
n i
It is clear that must work should receive the same type as works, that is, n\s. Since work
has type i, we are led to assign
must *-(n\s)/i
The same type will be assigned to other "modal auxiliaries" such as will and may.
In the same way, by looking at the sentences
John (is working) John (has worked)
n p n q
we are led to assign
is > ( n \ s )/p
has > (n\s)/q
Similarly we arrive at the following type assignments:
John must have worked, have ^i/q
n (n\s)/t q
3.2. ANALYSIS OF THE ENGLISH VERB PHRASE
51
ca
John has been working, been ^q/p
n (n\s)/q p
John is calling Jane, calling ^p/n
n (n\s)/p n
John must call Jane,
n (n\s)/i n
John has called Jane,
n (n\s)/q n
John is called, is *-(n\s)/(q/n)
n q/n
We summarize this information in the following table:
— s-i/n
called ^q/n
infinitive
present
past par-
third per-
participle
ticiple
son singular
present tense
intransitive
work
working
worked
works
verb
i
P
q
n\s
transitive
call
calling
called
calls
verb
i/n
p/n
q/n
(n\s)/n
modal aux-
must
iliary
(n\s)/i
progressive
be
been
is
auxiliary
i/p
q/p
(n\s)/p
perfect
have
has
auxiliary
i/q
(n\s)/q
passive
be
being
been
auxiliary
if (q/n)
p/(q/n)
q/(q/n)
(n\s)/(q/n)
There are of course many other transitive and intransitive verbs, even have may occur
as a transitive verb. On the other hand, it is difficult to think of many auxiliary verbs.
Here is presumably almost a complete list:
modal auxiliaries: must, will, shall, can, may, would, should, could, might.
progressive auxiliaries: be, stop, start, keep.
passive auxiliaries: be, get.
The blank squares in the table indicate that there do not exist forms to fit those
slots. In the column "modal auxiliaries" we are not surprised to note the absence of the
nonexistent forms *musting, *musted, and *to must. However, there do exist forms being,
52
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
having, and had; so why do these not appear in the remaining three squares? Because
we cannot say *John is being working, *John is having worked, *John has had worked. (An
asterisk indicates that the expression to which it is attended does not appear in standard
English.)
Our table could be enlarged by allowing for plural, past tense and 1st person forms.
This project would require additional primitive types, and we shall refrain from carrying
it out.
Using the above table, one can deduce that many strings of words are grammatical sen-
tences, even if these strings were not used in constructing the table. Some such sentences
turn out to be pretty unusual, for example
Jane has been being called
n (n\s)/q q/p p/(q/n) q/n
s *
P
s v '
Q
s v '
n\s
V v '
s
It would seem difficult to rule this out, without also ruling out
Jane has stopped being called
Jane has been getting called
Adverbs have been assigned the type s\s, as in
(John (must work)) today
n (n\s)/i i s\s
Sometimes it may seem more natural to let an adverb have type as in
John must (work well)
n (n\s)/t i i\i
However, in John works well it would then be necessary to say that works has not only the
simple type n\s, but also the compound type ((n\s)/i) ■ i. Similarly one can distinguish
between "coordinate" conjunctions and "subordinate" conjunctions, as in the following
sentences:
John works and Jane rests
n n\s (s\s)/s n n\s
John must work while Jane rests
n (n\s)/i i (i\i)/s n n\s
3.3. SYNTACTIC CALCULUS
53
Exercise
1. Using type assignments, show that the following are grammatical sentences.
a. John must have been working
b. John has stopped calling Jane
c. Jane may have stopped getting called
d. John is having milk
3.3 Syntactic calculus
We shall investigate a deductive system that will enable us to handle syntactic types in
the same way as the propositional calculus enables us to handle propositions.
Primitive types are s, n, i, p, q, and perhaps others, according to need.
Types are defined inductively thus: primitive types are types, and if a and b are types
so are a ■ b, a/b, and a\b.
A formula is an expression of the form a > b, where a and b are types.
We adopt one axiom scheme:
(1)
a —
— > a
and five rules of inference:
a —
-^b
i
(2)
a —
5- C
a ■ b
c
(3)
a —
-^c/b
a —
-+c/h
(4)
a ■ b
■> c
a • b
> c
(5)
b
> a\c
b
> a\c
(6) a- b >c
54
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
A formula is said to be a theorem if it can be deduced from the axioms and rules of
inference by means of a proof. We shall illustrate what we mean by a proof by looking
at some examples.
a/b — x -^ajb
(a/b) ■ b > c
a\b — ^ a\b
a ■ (a\b) > I
These two proofs show that rules (4) and (5) of Section 3.1 are in fact theorems of the
syntactic calculus. The theorems established by the following two proofs are new:
a\b
J)
a ■ b
a ■ b
a ■ (a\b)
■b/(a\b)
a ^{a-b)/b a-
For example, the last proof establishes the theorem n > s / (n\s) . This asserts that every
name also has the type s/(n\s), the type of the pronoun he in Section 3.1. Indeed, in any
context, he may be replaced by John, but not conversely. (It may also be replaced by Jane;
distinctions of gender do not enter the very limited grammar discussed in Section 3.1.)
Rule (3) of Section 3.1 does not appear among our present rules of inference. However,
it can be derived: given a > b and c > d, there is a proof which leads from these
assumptions to a ■ c > b ■ d. Here it is, a good example of a proof in tree form:
b-c-
-b-c
b-d
b-d
given
a > b
(b-c)/c
given
d — >b\{b-d)
(b-c)/c
■b\(b-d)
a ■ c
b-c
b-c
b-d
a ■ c > b ■ d
We shall write this derived rule for future reference:
a > b c > d
(a)
a ■ c >■ b ■ d
Other derived rules of inference are:
a > b c -
■d
a/d > b/c
3.3. SYNTACTIC CALCULUS
55
a > b c > d
(7) d\a ^c\6
In applying the syntactic calculus to English, we assumed the English sentences
are structured in a certain way. For example, we wrote John (likes Jane) but not
*(John likes) Jane and thus we arrived at the type (n\s)/n for likes and not n\(s/n).
As we shall see, these two types are equivalent in the associative syntactic calculus.
The associative syntactic calculus differs from the syntactic calculus by virtue of
two additional axiom schemes:
(7) (a-b)-c >a-{b-c)
(8) a ■ [b ■ c) (a-b)-c
We shall abbreviate x > y and y > x, to x <-> y and say x is equivalent to y.
Then (7) and (8) may be combined to
a • {b • c) <-> (a ■ b) • c
The following are now theorems in the associative syntactic calculus:
a\(b/c) <-> (a\b)/c
a/(b-c) <-> (a/c)/b
(a • b)\c <-> b\(a\c)
For example, we prove a\(b/c) >■ (a\b)/c by putting d = a\(b/c) proceeding as follows:
d^+a\(b/c)
6
a ■ d > b/c
4
a • (d ■ c) > (a • d) • c (a • d) • c > b
2
a • (d • c) 3- b
d ■ c > a\6
5
3
d > (a\b)/c
Another theorem is (a/6) • (6/c) ^a/c, which is proved as follows, making use of
56
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
the derived rule of inference (a):
b/c^-^-b/c
4
a/b^^a/t (b/cyc ^b a/b ^^ a/b
a 4
(a/b) ■ ((b/c) • c) ^(a/b)-b (a/b)/-b >a
2
((a/b) ■ (b/c) ■ c — U- (a/b) • ((6/c) • c) (a/b) ■ ((b/c) ■ c) > a
((a/b) ■ (b/c) -c ^a s
(a/b) -(b/c) >(a/c)
Exercises
1. Show that the following is a derived rule of inference in the syntactic calculus: if
a >b and c >d then a/d >b/c.
2. In the associative syntactic calculus prove the following:
a. a/b >(a/c)/(&/c)
b. (a\b)/c >a\(6/c)
c. a/(b-c) >(a/c)/6
3.4 Multiplicative systems
A number of algebraic systems have been used, or could be used in applications to lin-
guistics. We shall begin by considering a multiplicative system (A, *), consisting of a
set A and a binary operation * : A x A > A. No associative law is postulated, so we
must distinguish between (a* b) * c and a * (b * c).
Here is one illustration: (N,*), the set of natural numbers with operation a*b =
a b . Clearly, the associative law does not hold, as (a b ) c = a bc ± in general. Of
course, many multiplicative systems do satisfy the associative law, they are then called
semigroups. Examples of semigroups are (N,+), (N,-), (N,max), (N,min), (N,gcd),
(N,lcm), (N, first), (N,last), where
gcd(a, b) = greatest common divisor of a and b
lcm(a, b) = least common multiple of a and b
first(a, b) = a
last(a, b) = b
3.4. MULTIPLICATIVE SYSTEMS
57
The operations first and last are the projections we have met earlier. They are the only
ones among the listed operations which are not commutative, for example,
first (a, b) = a^b = first (6, a)
in general. In all these examples, the associative law is established on the same principle,
for example,
max(max(a, b), c) = max(a, b, c) = max(a, max(6, c))
first(first(a, b), c) = first(a, 6, c) = first(a, first(6, c))
To give a motivation for the syntactic calculus studied in the preceding chapter, we shall
define three operations on the subsets of a multiplicative system (a, •). Let X, Y, and Z
be subsets of A, then we define
X-Y = {x-y\xeXAyeY}
Z/Y = {x e A I 3yeY, x ■ y E Z}
X\Z = {y E A | 3xeX, x-y e Z}
It is now easily seen that
X-Y C Z if and only if X C Z/Y
if and only if Y C X\Z
and this should give some motivation for rules (3) to (6) in Section 3.3. Here is a proof
of the first statement:
X-YCZ if and only if VxeXVyeY , x ■ y E Z
if and only if VxGX, x E Z/Y
if and only if X C Z/Y
The second statement is proved similarly.
Given a set S, we shall define a bracketed string in E as follows:
BS-1. all elements of S are bracketed strings in E,
BS-2. if A and B are bracketed strings in E, so is (AB),
BS-3. nothing is a bracketed string unless this follows from (1) and (2).
For example, if a, b and c are elements of E, then ((ab)c) and (a(fec)) are distinct
bracketed strings of length 3. In practice, the outside parentheses are usually omitted. We
may define an operation • between bracketed strings by putting A ■ B = (AB). Then the
bracketed strings form a multiplicative system, called the free multiplicative system
generated by E. Perhaps the syntactic calculus should really be considered as dealing
58
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
with subsets of the free multiplicative system generated by the vocabulary of the language
we wish to study, these subsets being interpreted as types.
We shall conclude this chapter by looking at a purely mathematical problem. Given
elements ai, a 2 , . . ., a n of E, how many distinct bracketed strings do there exist which
contain exactly these elements in the given order? Let f(n) be this number. When n — 1,
2, 3, and 4, we can easily calculate f(n) directly.
n
bracketed strings
f(n)
1
a
1
2
(ab)
1
3
((ab)c), (a(bc))
2
4
(((ab)c)d), ((ab)(cd)),
(a(b(cd))), (a((bc)d)), ((a(bc))d)
5
Can we obtain an explicit formula for f(n)7 First we shall obtain a recursion formula.
For n > 1, every bracketed string of length n has, by definition, the form (AB), where A
and B are bracketed strings of lengths k and n — k, respectively, where 1 < k < n — 1.
Therefore
f(n) = E n k Zlf(k)f(n-k)
( * )
= f(l)f(n - 1) + f(2)f(n - 2) + • • • + f(n - 1)/(1)
This formula does allow us to calculate f(n) inductively but we shall do better than
that. Consider the formal infinite power series in the indeterminate x:
oo
F(x) = ]T f(n)x n = f(l)x + f(2)x 2 + ■■■
n=l
3.4. MULTIPLICATIVE SYSTEMS
59
Then F(0) = /(O) = and we have
F(x) 2 = £/(*)** E^O* 1
J vr-r 1
k=l 1=1
= (f(l)x + f{2)x 2 + f(3)x" + ■■■) (f(l)x + f(2)x 2 + f(3)x* + • • •)
= (/(l)/(l))x 2 + (/(l)/(2) + /(2)/(l))a ; 3 +
(/(l)/(3) + /(2)/(2) + /(3)/(l))x 4 + ---
= f(2)x 2 + /(3)x 3 + /(4)x 4 + • • •
oc
E /H^ n
n=2
oo
E /H^ n - /(^
n=l
= F(x) - X
Thus
F(x) 2 - +x =
which may be solved using the quadratic formula (be careful, F(x) is the unknown, while
x is the constant term!) to yield
F(x) = -
Since F(0) = we must choose the minus sign. Thus
Ft?) = 1 2 '
The expression (1 — 4a;) 1 / 2 can be expanded using the binomial theorem with exponent
1/2. When this is done, we find that
F{x) = l/2-l/2j:( l, *){-AxY
n=0 \
and therefore the coefficient f(n) of x n is given by
n
(**) /H = l/2 ' (-1)»4»
(-:) <-
60
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
We do not wish to leave the answer in this form, but we shall use the fact that the
binomial coefficient
fl/2\ _ l/2(-l/2)(-3/2)---(l/2-n-l)
\ n J l*2*3*---*n
_ (-l)(-2)(-3)--.(-(2n-3))
2 n n\
Multiplying top and bottom by2*4*6*---* [2n — 2), we obtain
/l/2\ _ (-l)"- 1 (2n-2)!
\ n J ~ 2 n n\2 n ~ l {n- 1)!
Substituting this into formula (**), we get
fin) = l (-ir 1 (2n-2)!
2n-2)\
~n\(n- 1)!
_ 1 /2n-2-
n \ n — 1
Writing n — 1 = m, we end up with the neater formula
1 /2m\
(* * *) f(m + l) = for all m >
m + lU/
For example, taking m = 4, we calculate
1 = 1 8*7*6*5
JK J 5 V4/ 5 1*2*3*4
The formula for /(n) was discovered by Catalan in 1838.
Exercises
1. Show that (N,gcd) is a semigroup.
2. Calculate /(6) twice, once using the recursion formula (*) and once using the explicit
formula (***).
3. For each a G N, let (a) = {a * x \ x G N} be the set of all multiples of a. In the
multiplicative system (N, *), prove that (a) * (6) = (a * b). Hint: (a) * (6) = {a*b* x *y \
x,ye N}.
3.5. SEMIGROUPS AND MONOIDS
61
4. In the notation of this section and Exercise 3, show that (c)/(a) = (c/d), where
d = gcd(a, c). Hint: Check that
(c)/(a) = {y E N | Vrre(a), y * x E (c)}
= {y E N | VmgN, y*a*uE (c)}
= {?/ G N | c divides y * a}
Moreover, writing c = cid and a = aid, use number theory to show that c divides y * a if
and only if Ci divides y.
3.5 Semigroups and monoids
In the previous section, we defined a semigroup as a multiplicative system (A, •) satisfying
the associative law:
(a • 6) • c = a • (b • c), for all a,b,c E A
A monoid (A, •, 1) consists of a semigroup (A, •) together with an element 1 E A such
that
a • 1 = a = 1 • a, for all a E A
We call 1 the unity element.
The unity element is unique, since if both 1 and 1' satisfied those equations, we would
have
1 = 1 • V = V
Examples of monoids are (N, +, 0), (N, *, 1), (N, max, 0), (N, gcd, 0), and (N, 1cm = 1).
A little reflection will show that (N, min), (N, first), and (N, last) are not monoids.
A group (A, consists of a monoid (A,-,l) together with an operation _1 :
A 5- A such that
a ■ gT 1 = 1 = gT 1 • a for all a E A
A group is called Abelian if also
a ■ b = b ■ a for all a, b E A
Examples of Abelian groups are the integers under addition and the positive ratio-
nal under multiplication. An example of a non- Abelian group is the set of all matrices
^ ° ^ ^ , where a, 6, c and d are integers such that ad — bc= 1, under the usual multi-
plication of matrices.
62
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
It has been suggested to change the notation n\s to n^ 1 ■ s, and to treat syntactic
types as elements of a group. Indeed, the type of poor John works would then be computed
as (n ■ n~ l ) • n • {n^ 1 • s) — s. Unfortunately the type of *John poor works would also have
type n • (n • n~ l ) • • s) — s, and so would many other unacceptable strings of English
words.
One place where we can apply the theory of groups is to the language of mathematical
physics. The grammatical sentences of this language are formulas or equations both sides
of which have the same dimension. The types or dimensions form an Abelian group. One
takes as basic dimensions L(— length), T(= time), and M(= mass). In a formula such as
Fl = mv 2 one assigns dimensions
m > M
v >LT v
I >L
F ^MLT- 2
Using the laws of an Abelian group, we find that both sides of the given equation
have dimension ML 2 T~ 2 , hence it is a grammatical sentence. This is not to say that the
formula is true; the correct formula may very well read Fl = l/2mv 2 . On the other hand,
the equation Fl 2 = mv is not even a grammatical sentence.
In this manner it is customary to assign to every physical quantity a dimension
M a L l3 T^, where a, (3 and 7 are rational numbers. Since this dimension is completely
determined by the triple (o;,/3, 7) we see that the dimensions form an Abelian group, in
fact a three-dimensional vector space over the field of rational numbers.
Most useful in applications to linguistics are the so-called free semigroup and monoid
generated by a set £. In view of these applications, E will be called the vocabulary
here, although it is often called the "alphabet" .
A string in £ of length n > 1 is an expression aia 2 • ■ • a n , where a±, a 2 , . . . , a n G S.
Strings are multiplied by juxtaposition
(aia 2 • • • a n ) ■ (bib 2 •••b k ) = a x a 2 ■ ■ ■ a n bib 2 •••b k
With this multiplication, the set of all strings in E is a semigroup, called the free semi-
group generated by E.
For reasons of elegance, people often consider the free monoid E* generated by E
instead. To construct this, we only have to admit the empty string 1 of length 0, and put
A- 1 = A = 1 • A
for each string A e £*.
3.5. SEMIGROUPS AND MONOIDS
63
We shall consider three examples, when £ has no element, one element, and two
elements respectively.
When £ is the empty set, then so is the free semigroup generated by S, but the free
monoid S* = 1 must contain the empty string as an element, hence is not empty.
When E = {a}, the free semigroup consists of all strings a n — a • • • a (n times), where
n > 1. The free monoid £* = {1, a, a 2 , . . .} = {a n \ n G N}, provided we define a = 1.
Multiplication of a 11 and o k yields o n ■ o k = a n+k ; in fact the free monoid (S*, •, 1) is
isomorphic to (N,+,0). (An isomorphism between two monoids is a one-to-one corre-
spondence so that the binary operation and the unity element of one monoid correspond
to those of the other.) Note that a n ■ a k = a k ■ a n , hence S* is commutative.
When E = {a, r}, where cr^r, the free semigroup is
{a, r, aa, err, ra, rr, aaa, . . .}
and the free monoid is the union of this and {1}. As err and to are different strings, we
see that X* does not satisfy the commutative law.
We may also consider free semigroups and monoids generated by more than two ele-
ments. Indeed, the vocabulary of English has a very large number of elements. However,
in principle, for any vocabulary of more than two elements, this number can be reduced
to two by coding as we shall see.
For instance, in English we may start by reducing the vocabulary to an alphabet of
27 elements (including the space between words), and these 27 elements can be further
reduced to 3 symbols using the Morse code: •, — , and |, the last being the divider between
successive letters. Finally we may represent • = rar, — = raar, and | = rr, where a is a
click and r is a pause, as in telegraphy.
Exercises
1. If a semigroup can be made into a monoid by selecting a unity element, show that the
unity element is uniquely determined.
2. If a monoid can be made into a group by specifying the operation -1 , show that x~ x
is uniquely determined by x.
In the following exercises, two monoids are called isomorphic if they become identical
when the elements and operations are suitably renamed.
3. Prove that the monoid (N, 1, •) is not isomorphic to the free monoid generated by the
empty set.
4. Prove that the monoid (N, 1, •) is not isomorphic to the free monoid generated by {<r},
but that the latter is isomorphic to (N, 0, +).
5. Prove that the monoid (N, 1, •) is not isomorphic to the free monoid generated by a
set S with two or more elements.
CHAPTER 3. SYNTACTIC TYPES AND SEMIGROUPS
Chapter 4
Context-free grammars
4.1 Grammars for some fragments of English
Given a vocabulary E, let E* be the free monoid generated by E, that is, the set of all
strings of elements of E, including the empty string 1. By a language in E we mean a
subset £CE*. We want to describe £ by a finite set of grammatical rules. Let us begin
by looking at an example taken from English.
4.1.1 Example
This language consists of a total of 2*3*2*2*3 = 72 sentences. We could adjoin a
symbol S(— sentence) to the vocabulary and list 72 grammatical rules of the form:
E = {the, a, man, ball, cat, hit, took}
the man hit the ball
C = < a ball took a man
cat cat
S
^-the man hit the ball
A more customary method is to adjoin an auxiliary vocabulary
{S, NP, VP, Art, N, Vt]
of grammatical terms, where
S = sentence
NP = noun phrase
VP = verb phrase
Art =
N =
V t =
transitive verb
noun
article
65
66
CHAPTER 4. CONTEXT-FREE GRAMMARS
Instead of the above 72 grammatical rules, only 10 are required, to
S ->- NP VP Art ->■ the, a
NP Art N N man, ball, cat
VP ->- y ( JVP VJ hit, took
Note that, compared with the categorial grammars studied earlier, the present arrow
points in the reverse direction. This is because we now take the point of view of the
speaker who generates the sentence, whereas previously we took the point of view of the
listener who analyses the sentence. We subject the arrow to the usual reflexive, transitive,
and substitution rules:
, m A *B B >C A >B C
A^Ua (T) (S)
A >C AC > BD
To show that the man hit the ball is a sentence, we could proceed quite formally thus:
Art ^the N ball
(S)
NP *~ArtN Art N ^ the ball
(T)
V t ^ hit NP ^the ball
VP > V t NP V t NP > hit the ball
VP ^ hit the ball
Art ^the N ^man
(S)
(T)
(S)
NP ^ArtN Art N ^ the man
(T)
NP > the man VP > hit the ball
(S)
NP VP NP VP ^the man hit the ball
(T)
S s-the man hit the ball
In practice, this proof in tree form may be abbreviated as a so-called parsing tree, shown
as follows:
S
NP VP
Art N V t NP
the man hit
Art N
the ball
4.1. GRAMMARS FOR SOME FRAGMENTS OF ENGLISH
67
the ball
The reader may argue that there is too much theory just to replace 72 rules by 10.
The next example will show that a language may quite easily consist of an infinite number
of sentences (most languages do), and that grammar then becomes a matter of necessity
rather than choice.
4.1.2 Example E is as above with three new words added: and, or, but.
New grammatical term: Conj = conjunction.
New grammatical rules: S ->■ S Conj S
Conj ->■ and, or, but.
The total number of grammatical rules is now 14, but the total number of sentences is
infinite, for example
the man hit the ball and the man hit the ball and • • •
4.1.3 Example One may argue that the conjunctions and and or can be placed not
only between sentences but also between noun phrases, verb phrases and transitive verbs.
This is not so for the conjunction but. To allow for this we rewrite the grammar of
Example 4.1.2 in the following modified form
S ->■ S Conj S Conj ->■ and, or
NP NP Conj NP S S but S
VP ->- VP Conj VP V t ->- V t Conj V t
The kind of grammar described in this section is known as a "context-free phrase-
structure grammar" or just a "context-free grammar". A precise definition of this concept
will be given in the next section.
68
CHAPTER 4. CONTEXT-FREE GRAMMARS
Exercise
1. Give parsing trees for each of the following sentences, using the grammar of Exam-
ple 4.1.3:
the man and the cat hit the ball
the man took and hit the ball
the man hit the ball and the cat
the man hit the cat and the cat hit the ball
4.2 Production grammars and deductive systems
Before we give a precise definition of what we mean by the term "context-free grammar"
introduced in Section 4.1, we shall consider a more general concept.
4.2.1 Definition A production grammar or rewriting system Q = (V,
consists of the following data:
PG-1. a finite set V called the vocabulary (its elements are called words );
PG-2. two nonempty subsets E and E' of V called the terminal and initial vocabulary
respectively;
PG-3. a finite subset V ofV* x V* , that is, a finite set of pairs of strings of words. These
pairs are called productions, and we write A > B instead of saying that (A, B)
is an element ofV.
These data are usually subjected to the following restrictions:
PG-4. E and E' have no elements in common.
PG-5. E' has only one element, that is, E' = S, where S is not in E.
PG-6. A is not a nonempty string of terminal words;
PG-7. B is not a nonempty string of initial words.
COMMENTS.
(a) Sometimes V is called an "alphabet" , its elements are then called "letters" .
4.2. PRODUCTION GRAMMARS AND DEDUCTIVE SYSTEMS
69
(b) We are particularly reluctant to adopt the restriction (PG-5). Let us see what we
could do without this restriction. E' might consist of sentence types: S = statement,
question, command, etc.
If E consists of all English words, V — E would consist of grammatical terms such
as NP, VP etc., and the productions would be S >■ NP VP etc. Thus Q would be
a grammar of English from the point of view of the speaker. However, we get another
example if we take E' to be the vocabulary of English, E the set of sentence types, and if
we reverse the arrows in all of the above productions, so that a new production would be
NP VP >■ S, for example. Then the new grammar Q would be a grammar of English
from the point of view of the hearer.
(c) The reason for the restriction PG-4, for example, is that, in a grammar of English from
the point of view of the speaker, a terminal string such as the man hit the cat should not
be rewritten. Following Chomsky, a grammar is said to be
(0) of type 0, or a generative grammar, if it satisfies the above restrictions PG-4,
PG-5, PG-6 and PG-7. Usually people are interested in grammars that satisfy
further restrictions.
(1) of type 1, or a context sensitive grammar, if, in every production A >B,
A is not longer (contains not more words) than B;
(2) of type 2, or a context-free grammar, if, in every production A *-B, A is a
nonterminal word and B is nonempty;
(3) of type 3, or a finite state grammar, if every production has the form A ><r
or A > oB where A is a nonterminal word, a is a terminal word and B is any
word.
Clearly, generative C context sensitive C context-free C finite state. We shall not ex-
plain the words "context-sensitive" and "context-free". In fact, we shall disregard context
sensitive grammars altogether and, for the time being, confine attention to context-free
grammars.
With any grammar Q we associate a deductive system T>(Q). Its formulas are
expressions of the form C *-D, where C and D E V*. An axiom of V(Q) is a production
A >B or an instance A > A of the reflexive law (R). Moreover, there are two rules
of inference the transitive law:
A
>B
B
(T)
A
>C
and the substitution rule:
A
>B
C
>D
(S)
AC
^BD
70
CHAPTER 4. CONTEXT-FREE GRAMMARS
Proofs are defined inductively: every axiom is a proof, and applying one of the rules
of inference to two proofs we get a new proof. The last formula in a proof is the theorem
which has been proved.
Proofs are usually written in tree form, see for example the proof of the formula
S > the man hit the ball
in Section 4.1. This formula is therefore a theorem in the deductive system T>(Q), where
Q is the grammar of Section 4.1, Example 4.1.1. The substitution rule (S) is sometimes
given a different formulation:
A
(so
CAD *~CBD
It is easy to derive (S') from (S) as follows:
(R)
C V > C A s- B
(S) (R)
CA > CB D K ' > D
(S)
CAD ^ CBD
Conversely, one may derive (S) from (S'). Note that C = 1AC, BC = BC1.
A *B C >D
(S') (so
AC *BC BC *~BD
- (T)
AC *BD
Grammatical facts are expressed by formulas which can be proved in the deductive
system V{Q). We are particularly interested in theorems of the form S >E, where S is
an initial word and E e S* is a string in the terminal vocabulary. The language or set of
sentences generated by the grammar Q is the set C{Q) consisting of all E e S* for which
S *- E is provable in T>{Q).
In the examples of Section 4.1, E is a fragment of the vocabulary of English, V — £
consists of grammatical terms, £' = {S}, and the productions are the grammatical rules
stated there. We shall now give two other illustrations of certain artificial languages, built
up from a vocabulary consisting of two words, say a and r.
4.2.2 Example
C\ = {ar,aarT,aaaTTT, . . .}
= {a n r n | n > 1}
4.2. PRODUCTION GRAMMARS AND DEDUCTIVE SYSTEMS
71
We wish to find a context-free grammar Q\ so that C\ = C(Qi). Take V = {o, r, S}
and let V consist of the following two productions:
P-l. S ><tt,
P-2. S xtSt.
Here, for example, is a proof of the formula S ^oott, showing that oott G C\
Q P-l
S ^OT
P-2 (S')
S >■ a St a St > oott
(T)
S > OOTT
It can easily be seen that, for any string E G £*, E E C\ if and only if 5 >E is
deducible from P-l and P-2, that is, provable in the deductive system T>(Qi). While this
fact ought to be clear intuitively, a proof will be given in Section 4.7, just to show that it
is possible to give such a proof.
4.2.3 Example
E = {o,t}, o^t
£ 2 = {OO, TT, OTTO, TOOT, .. .}
= {AA# I A G S* and A ^ 1}
where A* is the mirror image of A. That is to say, if A = a\a 2 . . . a n _ia n , where the
di G S, then A* = a n a n _i . . . a 2 ai. We wish to find a context-free grammar £ 2 such that
c 2 = c(g 2 ).
Again take V = {o, r, S} and let T 3 consist of the following productions:
P-3. S ^oo,
P-4. S ^TT,
P-5. 5 ><tS<t,
P-6. 5 ^tSt.
It can now be easily seen that a string E G S* is a sentence of £ 2 if and only if S *-E
is deducible from P-3 to P-6.
REMARK. The language £[ = \,ot,oott, . . . = {o n T n \ n > 0} can be described
more elegantly by a grammar with productions S > 1 and S > oSt. However, £[
is not the same language as C\, it has one additional sentence, the empty string, and
the grammar is not context-free. Similarly the language C 2 = {AA# \ A G S*} can be
described more elegantly than £ 2 by a context-free grammar with productions S > 1,
S > oSo and S > tSt. This grammar is also not context-free.
72
CHAPTER 4. CONTEXT-FREE GRAMMARS
Exercises
1. DeduceS* >■ aa ottt for L\.
2. DeduceS* ^rarrar for £ 2 -
3. Find a context-free grammar that will generate the language
{a n pr n | n > 1}
for the terminal vocabulary E = {p, o", r}
4. What language is generated by a context-free grammar
(V,{<7, r}, {S}, P)
with vocabulary
V = {a, r, S, U, V}
and productions
V = {U >■ cr, f/ xtC/, V >t, V ^rV, S *UV}
4.3 Grammars of some formal languages
Context-free grammars are useful not only for describing natural languages such as Eng-
lish, but also for describing certain formal languages created by mathematicians. In
particular, we shall discuss three such languages: the propositional calculus, the language
of set theory, and a programming language for the abacus.
4.3.1 Example The propositional calculus. Its vocabulary is usually presented as
follows:
Variables: p,q,r, . . .
Connectives: ->,=>-, A, V,
Parentheses: (,).
The set of sentences (= formulas) is usually described inductively as follows.
F-l. Every variable is a sentence.
F-2. If A is a sentence so is ->A.
F-3. If A and B are sentences so are (A =>• B), (A V B), and (A A B).
F-4. Nothing else is a sentence.
4.3. GRAMMARS OF SOME FORMAL LANGUAGES
73
This is to say, the set of sentences is the smallest set of strings in the given vocabulary
which contains F-l and is closed under F-2 and F-3. For example, we have the following
two sentences with their construction:
(1) (1) (1) (1)
p q q r
(3) (1) (1) — — (3)
P^q r p (qVr)
(3) (3)
(H?)Vr) (P^(gVr))
Usually people omit outside parentheses. Thus the above two sentences may be written
without ambiguity:
(y> =>• g) V r, p => (g V r)
However it won't do to omit parentheses altogether, as the expression p =>- q V r is am-
biguous.
Actually, the right parentheses are redundant, and one could still distinguish the
following:
((p =>- q V r, (p =>- (q V r
Parentheses may be redundant, but they do facilitate reading. In English, ambiguity
is avoided in a similar fashion. We say
if p then q in place of (p q
either p or q in place of (p V q
both p and q in place of {p A q
The above two formulas are transcribed into English thus:
either if p then q or r, if p then either q or r
The second formation is more common than the first. The reason for this lies in the
limited capacity of the short term memory in the human brain.
A word should be said about the bracketless or "Polish" notation. One may avoid
parentheses altogether by writing connectives in front, thus =>- pq, Vpg, Apq in place of
(p => q), (pVg), (p A q). The above two formulas would become in Polish notation:
V =>- pqr, p\/ qr.
Economy of writing and ease of reading do not always seem to agree! However, it is
not clear if the ease of reading is due to our lack of familiarity or this notation is inherently
harder to understand. After all, we use a form of this in vertical mode when we learn
arithmetic in elementary school.
After this digression, let us return to the propositional calculus and obtain a context-
free grammar for it.
74
CHAPTER 4. CONTEXT-FREE GRAMMARS
First, we must make the vocabulary finite, so let us replace p, q, r, ... by p, p', p",
Thus we have
£ = {p,', -.,=*►, A, V,(,)}
V-Z = {S,V, C}
where
S = sentence
V = propositional variable
C = binary connective
The productions are the following:
V^p S^V
C^^,y,A S^(SCS)
Here, for example, is a parsing tree for the sentence (p =>- (p' V p")):
p
We have not indicated the symbols "(" and ")" explicitly on the tree.
4.3.2 Example The language of set theory. As all mathematics can be reduced to
set theory, it is now fashionable to regard this also as the language of mathematics. We
present a context-free grammar for set theory.
E = {V,=,e,-,^,A,V,V,3,(,)}
4.3. GRAMMARS OF SOME FORMAL LANGUAGES 75
V-X = {S,V,Q,C}
where
S = sentence
V = individual variable
Q = quantifier
C = binary connective
Productions are the following:
V
= v
V
•S^V G V
Q
Q
c
— =S>, V, A
S^Q S)
We give an example of a parsing tree for the sentence:
VrrEb' (x = x) V -i(x G x')
S
x
We have not indicated the symbols "(" and ")" explicitly.
Of course, the above grammar does not tell us everything we want to know about the
language of mathematics. It does not distinguish between open sentences, which contain
free variables, and closed sentences, which do not. Above all, it does not generate the one
set we are really interested in, the set of theorems.
76
CHAPTER 4. CONTEXT-FREE GRAMMARS
4.3.3 Example A programming language for the abacus. For purpose of illustration,
recall the program: transfer content of A to B.
start
^"(nocle 1)
5+(node 2) stop
We may regard this program as consisting of the following instructions:
start
1
2
go to 1
A~ go to 2 else go to stop
B + go to 1.
We wish to consider these instructions as sentences of a language.
To ensure a finite vocabulary, we place A, B, C, . . . by A, A', A",
byl, 1', 1", ...
£ = {start, stop, +, ', goto, else, :}
V-S = {S, L, N, M}
and 1, 2, 3, . . .
where
S = sentence (= instruction)
L = location
N = node other than startor stop
M = node other than start.
Productions:
L
L
N -
■A
V
1
S N : L~ go to M else go to M
M^-stop
S start: go to M
S -*-iV : L+ go to M
Exercises
1. Write out a parsing tree for ((p ^> p') V p").
2. Along the lines of Example 4.3.3 above, what instructions are needed for the program:
copy content of A into B.
4.4. NEW LANGUAGES FROM OLD
77
4.4 New languages from old
Given a terminal vocabulary E and a language £ C E*, we may form two new languages
in the same vocabulary:
-£ = {A e E* | A i £}
C# = {A* G E* | A e £}
—£ is the complement of £ and is the mirror image of £. Given two languages £
and £' in the same terminal vocabulary E, we may form three new languages as follows:
CUC' = {A G E* | A G £ V A G £'}
£ n £' = {A G E* | A G £ A A G £'}
££' = {A4' G E* | A G £ A A' G £'}
£ U £' is the union, £ n £' is the intersection, and ££' is the product of the given
languages. As an example of the product of two or more languages we may form
£ 2 = {AB GE* I A,B G £}
£ 3 = {ABC G E* | A,B,C G £}
and so on. Finally, we are able to construct
£* = {A G E* | 3n>l, A G £ n }
Thus £* consists of all strings which are made up by juxtaposing any number of strings
of £; that is, the sentences of £* are the texts of £. For example, if £ = {a}, then
£ 2 = {a 2 }, £ 3 = {a 3 }, and £* = {a, a 2 , a 3 , . . .}.
4.4.1 Theorem If £ and £' have context-free grammars then so do £ # , £* , £ U £' ,
and ££' .
Proof We shall assume that £ has vocabulary V, initial element S G V — E, and
productions V We also assume that £' has vocabulary V', initial element S' G V — E,
and productions V . Without loss in generality, we may also assume that V — E has no
symbols in common with V' — E; otherwise we could change the notation to make this so.
Thus V n V = E.
(1) For the grammar of £ U £' we take the vocabulary V U V U {T}, assuming that T
is a new symbol, not in V or V. T will be the initial element for £ U £'. As productions
we take V U V U {T ^ S, T > S'}. It is easily seen that, for any string E G E*,
T is deducible from these productions if and only if E G £ U £'.
(2) For the grammar of ££' we take the same vocabulary as in (1), and the set of
productions V UV' U {T *-SS'}. It is easily seen that, for any string E G E*, T >E
is deducible from these productions if and only if E G ££'.
78
CHAPTER 4. CONTEXT-FREE GRAMMARS
(3) C* has sentences of the form A\A 2 ■ ■ ■ Ak, where Ai, A 2 , . . ., A k e C, and k > 1.
We take as vocabulary V U {T}, where T is a new symbol that will serve as the initial
element. We take as productions the set V U {T > 5, T > ST} It follows that
T ^S 2 , T ^S 3 , etc. It is easily seen that for any string E G £*, T ^E is
deducible from the above productions if and only if E G C*.
(4) For the grammar of C* we take the same vocabulary as for £ but replace the
productions of C by defined so that the production A* > B# for every production
A B of V. It is easily seen that, for any string E G £*, 5 > .E is deducible
from these m productions if and only if S — S # > was deducible from the old
productions, that is, E* G £, that is, E G C*.
Up to verifying the statements about which it was said that they could easily be seen,
our proof is now complete.
To illustrate this theorem and its proof, we recall the languages C\ and C 2 of Section 4.2
as well as their grammars. To make sure that the auxiliary vocabularies have no symbols
in common, we replace their initial elements by Si and S 2 respectively.
The productions for L\ were:
Si >■ or, Si > aS\T.
The productions for C 2 were: The productions for L\ U C 2 were:
S 2 > cro", S 2 > rr, S 2 > <jS 2 <j, S 2 > rS2r.
The productions for £i U £2 now are:
51 >■ err, Si >■ (7 Sir, S2 >■ era, S 2 > rr,
5 2 ^(7S2<7, S 2 ^rS 2 r, T ^Si, T ^S 2 .
T serves as the initial symbol.
The productions for C\C 2 are:
Si > err, Si > o" Sir, S2 *- (7(7, S2 *- rr
S2 5-crS2<7, S2 ^tS 2 t, T > S1S2
The productions for C\ are:
Si >crr, Si -aSir, T * S u T ^S X T
The productions for Cf are:
Si >rcr,
Si -rSid.
4.5. HOW ARE SENTENCES PRODUCED AND RECOGNIZED?
79
The initial symbol here is still Si.
The reader may ask why we did not include in the theorem assertions that — C and
C fl C have context-free grammars. It will in fact be shown in Section 4.8 that there
exist languages £ and C with context-free grammars for which £ fl £ does not have a
context-free grammar.
Assuming that, we shall now prove that the complement of a language with a context-
free grammar does not necessarily have a context-free grammar. Indeed, suppose this
were the case. Let C and C! be any two languages (with the same terminal vocabulary)
which have context-free grammars. Then
CnC = -{-Cu-C)
would also have a context-free grammar, which is a contradiction.
Exercise
1. Find context-free grammar for the following languages:
C* 2 , Cf, ClUjC 2 ,jCfC 2 , (A£ 2 )*, (£iU£ 2 ) #
Here C\ and £ 2 are the languages introduced in Section 4.2.
4.5 How are sentences produced and recognized?
It is highly unlikely that a person while engaged in uttering a sentence will construct a
parsing tree in his mind or even carry out a proof in a deductive system. Let us look at
an example, based on the grammar of Section 4.1.
n
S ->- NP VP 2
->■ Art N VP 3
-»■ the iV VP 2
>- the man VP 1
-> the man VP Conj VP 3
-> the man V t NP Conj VP 4
the man took NP Conj VP 3
-> the man took Art N Conj VP 4
-> the man took the iV Conj VP 3
the man took the ball Conj VP 2
the man took the ball and VP 1
the man took the ball and V t NP 2
80
CHAPTER 4. CONTEXT-FREE GRAMMARS
the man took the ball and hit NP 1
the man took the ball and hit Art N 2
-> the man took the ball and hit the N 1
-> the man took the ball and hit the cat
Note that in going from line 1 to line 2, we apply the production NP > Art N
rather than the production VP > VP Conj VP. Again, in going from line 2 to line 3,
we apply the production Art >■ the rather than N > man or VP > VP Conj VP.
In other words, we replace nonterminal words as soon as possible, starting from left to
right.
We note that at each stage of the generation of the above sentence, part of the sentence
has already been uttered, while the rest only exists in the form of a plan. The plan consists
of a string of grammatical terms. The number n of these terms varies, as we go along, from
2 to 0, the maximum in the above example being 4. We may think of these grammatical
terms as being stored temporarily in the short term memory of the human brain.
Psychologists have found that the short term memory is capable of storing at most
seven "chunks" of information. They define one chunk as a nonsense word. However,
adapting a proposal by Victor H. Yngve to our present terminology, we are tempted to
say that each element of the vocabulary counts as one chunk. Yngve himself believed that
the grammatical rules of a natural language would automatically rule out n > 7. This is
not quite true, as the following example shows.
Let us look at the fragment of English consisting of the words:
John, Jane, Tom, works, rests, complains, if, then
together with the following productions:
S^NP VP
NP John, Jane, Tom
VP works, rests, complains
S — if S then S.
We shall now form two sentences, which in logical notation have the forms p =>- (q =>-
(r =>- s)), ((p =>■ q) =>- r) =>■ s.
(1) If John works then if Jane rests then if Tom works then Jane complains
(2) If if if John works then Jane rests then Tom works then Jane complains
One may conceivably utter (1), but uttering (2) is inconceivable, although grammat-
ically they are both well- formed. Why is this so? Let us see how (1) would actually be
uttered.
4.5. HOW ARE SENTENCES PRODUCED AND RECOGNIZED?
81
n
S ->■ if S then S 3
if JVP VP then 5 4
if John VP then 5 3
-> if John works then S 1
-> if John works then if S then S 3
We note that while uttering if, the three items U S then S" must be put into temporary-
storage. The word then, while belonging to the terminal vocabulary, cannot be uttered
until after the word works has been uttered. We easily check that n < 4 throughout the
entire process of generating the sentence (1).
The situation is different with (2).
n
S if S then S 3
-> if if S then S then S 5
-> if if if S then S then S then S 7
-»■ if if if AP VP then 5 then S then 5 8
->■ if if if John VP then S then 5 then S 7
-> if if if John works then 5 then 5 then 5 5
If our psychological hypothesis is correct, a speaker will get stuck as soon as n > 8.
One possible conclusion from this is that some grammatically possible sentences are
never uttered, because they would overload the temporary storage capacity of the human
brain. We shall drive home the point by considering another example. Let us add to the
previous grammar the following productions:
Adj ->■ nice, known, true
S — ^ it is Adj that S
S ->that S is Adj
The following sentence causes no difficulty:
(1) It is true that it is known that it is true that Jane works
However, the following sentence is inconceivable:
(2) That that that Jane works is true is known is nice
Indeed, at one stage of its attempted generation we would have:
82
CHAPTER 4. CONTEXT-FREE GRAMMARS
n
S ->■ that that that NP VP is Adj is Adj is Adj 8
REMARK. We should point out the difference between "chunk of information" and
"bit of information". The latter has a technical significance: bit = binary digit. If
we think of a chunk as being a four letter word, each letter having 26 possibilities, say
approximately 25, then 1 chunk is approximately 4 ■ 5 = 20 bits.
Commentary by MB
On the other hand, as the following example shows, an explanation in terms of the number of
"chunks" to be stored is over-simplified. The following sentence is not a model of graceful style,
but still can be processed, while in the course of processing the number of such chunks generated
is as much as nine. The sentence uses yet another rule:
S^-both S and S
which generates such sentences as
both John is late and Jane is early
and the one that interests us
If both that Jane works is true and that John works is false then Sue sleeps
if S then S
if both S and S then S
if both that S is S and S then S
if both that NP VP is S and S then S
n
3
5
7
Consideration of this and sentences that are even worse from the point of view of chunks
of information (using the production S >■ either S or S and S > neither S nor S) suggest
quite a different interpretation of these facts; namely that there are specific processors in the
brain that process such constructions and that these processors have no stack. Thus you cannot
interrupt the if S then S processor for another construction of the same type, although there is no
especial problem with using another type of complex construction that uses a different processor.
Having formed some idea how sentences are produced, we are led to ask how strings
of words are recognized as sentences. Let us take the same example as before, now from
the point of view of the hearer:
n
the man took the ball and hit the cat
< — Art man ... 1
< — Art N . . . 2
4.5. HOW ARE SENTENCES PRODUCED AND RECOGNIZED?
83
■<— NP took ... 1
< — NP V t the . . . 2
-e— AP VJ Art ball ... 3
< — NP V t Art N . . . 4
-«— NPV t NP ... 3
■<— AP VP and . . . 2
AP VP Conj hit... 3
NP VP Conj V t the . . . 4
AP VP Conj VJ Art cat 5
-«— AP VP Conj VJ Art AT 6
-«— AP FP Conj Vi AP 5
-<— AP VP Conj VP 4
^ NP VP 2
— 5 1
A number of comments are in order:
1. We have used the same grammar that had been used for producing the sentence. It
might have been tempting to reverse the arrows, thus operating in the "dual" grammar
with productions Art N > NP, and so on. However, this dual grammar is not
context-free (or even generative, as it has more than one initial word.)
2. After line 9 it would have been tempting to write <— S and • • •, but then we would
have ended up with <— S Conj VP, and there is no production which will justify our
replacing this by S. Presumably a human hearer makes use of the other cues, for
instance intonation, to see that the sentence is not yet complete.
3. The numeral on the right denotes the number of grammatical terms that are held in
temporary storage at each stage. The number n = 6 in line 13 is dangerously close to
the upper bound of 7. In fact, when we analyze the sentence
the man took the ball and hit the cat and the dog
we do get
n
AP VP Conj V t NP and 5
AP VP Conj V t NP Conj the 6
-«— AP VP Conj V t NP Conj Art dog 7
-<— AP VP Conj V t NP Conj Art N 8
84
CHAPTER 4. CONTEXT-FREE GRAMMARS
What is puzzling is that there are 8 chunks of information in storage, and yet there
is no apparent difficulty in someone's recognizing the above as a sentence. One expla-
nation comes by replacing the production NP *- NP Conj NP by two productions:
NP > NP* NP and NP* ^ NP Con]. Similarly we can replace VP > VP Conj
VP by two productions.
Exercises
In each of the following exercises, count the chunks of information in storage at each stage.
1. Show how a person might produce the sentence.
the man hit the cat and the dog
2. Show how a person might recognize the string
the man took the ball and hit the cat and the dog
as a sentence, utilizing the suggestion made in the last paragraph above.
4.6 Machines for producing and recognizing sentences
We shall construct a machine that can produce the sentences of a language which is
generated by a normal context-free grammar where all productions have one of two
forms:
T > W T > UV
The machine we have in mind is a variation of what is usually called a "pushdown au-
tomaton" . It consists of two tapes: a storage tape capable of moving in both directions
and an output tape allowed only to move to the left. There is a scanner which can read
and write on both tapes. Both tapes are inscribed with words of the same vocabulary V,
the total vocabulary of the grammar to be considered.
storage tape
scanner
output tape
We have shown the machine at the beginning of an operation: all squares but one are
blank; only the scanned square of the storage tape is imprinted with the symbol S,
representing the fact that the machine plans to utter a sentence.
4.6. MACHINES FOR PRODUCING AND RECOGNIZING SENTENCES
85
The machine is non-deterministic: a given situation may give rise to several pos-
sible moves; in fact, there is usually more than one way of completing a sentence. We
now describe the moves of the machine, which depend on the given normal context-free
grammar.
(1)
This means: if the scanned square of the output tape is blank and the scanned square of
the storage tape has T written on it, the machine prints T on the scanned square of the
output tape and erases it on the storage tape.
(2)
/ left s
\stay y
This means: If both scanned squares are blank, the storage tape moves one square to the
left.
(3)
/ right N
V sta y ,
This means: If both scanned squares have words written on them, the storage tape moves
one square to the right.
(4)
provided T
(5)
provided T
(6)
W is a production of the grammar.
UV is a production of the grammar.
T
/stay\
V left J
provided T e S. Thus, if the scanned square of the output tape contains a word of the
terminal vocabulary E, the output tape moves one square to the left.
We shall illustrate the operation of the machine for the fragment of English discussed
in Section 4.1, Example 3. Unfortunately, a production such as
NP
NP Conj NP
does not belong into a normal grammar, so we replace it by two productions:
NP > NP NP'
86 CHAPTER 4. CONTEXT-FREE GRAMMARS
NP' > Conj NP
We shall describe in detail how our machine would utter the sentence:
the man hit the cat and the ball
according to the following parsing tree:
S
NP VP
art N V t NP
^ man hit — —
Art N Conj NP
the cat and ~T~ ~
Art N
the ball
In describing the successive situations of the machine, we shall only indicate what is
written on the two scanned squares and on a few relevant squares to the right of the
scanned square of the storage tape. In fact, all squares to the left on the scanned square
of the storage tape are blank, and so are all squares to the right of the scanned square of
the output tape. The squares of the output tape to the left of the scanned square contain
nothing but past utterances. At no time need the machine remember what it has already
uttered. Its memory ( = storage tape) need only contain the plan of what will be uttered
in future. Well, here are the successive situations of the machine:
OJ \S
N VP
the
VP\
man /
NP\
hit J
N NP'
Art
N NP'
( VP )
\NPJ
VP\
VP
NP
N VP
NP
N NP'
Art
NP'
N
NP
VP\
VP
cat
VP
N
Art
N
VP
NP
N NP'
the
NP'
VP
(If)
NP'
NP
NP'
N
Art
\P\
N
v t
NP
N NP'
NP'
)
NP
NP'
4.6. MACHINES FOR PRODUCING AND RECOGNIZING SENTENCES
87
/0\_ f NP \ _ f NP\ f NP\ _ (0 NP'
{nP'J ~" \Conj) ~" \Conj ) ~" [and ) ~^ \0
\ J ~" \NP J ~" \Art) ~" {Art J ~" [the J ~" \0
(°) + ( ° ) + (°
J \NJ \ba\\ J \0
The machine does not come to a stop, but the storage tape moves to the left and keeps
on doing so, until a nonempty square is scanned, or until we come to the end of the tape.
It will have been noticed that the entire operation above required only three squares
of the storage tape. One can easily check that the grammar of the fragment of English
under consideration does not require more than a very small number of squares of the
storage tape. For all practical purposes then, our machine is a finite automaton.
Let us see what happens if we adjoin to our grammar the production
S if S then S
To make sure the grammar is normal, we must replace this by three productions
S if S',
S' S S",
S" then S.
As we try to generate sentences that begin with if, if if, if if if, etc., we see that more and
more squares of the storage tape are required. It is a matter of experience that a human
speaker will get stuck when he tries to utter a sentence starting with if if if. This shows
that whatever corresponds to the storage tape in the human brain has in fact a quite
small capacity.
Surprisingly, the same machine can also be used for recognizing the sentences generated
by a normal context-free grammar. By this we mean: if a terminal string is entered on
the output tape (better called "input tape" now) to the right of the scanned square, the
machine will erase this string and convert it to the initial symbol S on the scanned square
of the storage tape, provided it adopts the right strategy.
We adopt the following moves:
« (o) - (li)
(2')
88
CHAPTER 4. CONTEXT-FREE GRAMMARS
provided W
(3')
■T is a production of the grammar.
T
left
(4')
provided W
(5')
right
W
UV is a production of the grammar.
V stay J
The fourth move, for example, means: erase both scanned words, print W on the output
tape, and move the storage tape one square to the right.
' ^ the machine may find itself in a quandary. Sup-
In the situations
T
and
V
pose, for example, the machine is in the situation ( ). When there is no production
U
W > UV, only move (5') is applicable. When there is a production W > UV, we
prefer to make move (4'), but it may be a bad move, as we shall see in an example.
We assume the sentence the man hit the cat and the ball is entered on the input tape,
to the right of the scanned square. We shall not show what is written on the input tape,
except on the scanned square. Here are successive situations of the machine:
NP
NP
V
NP V
the
o
the
Art
cat
^NP
NP
Art,
hit
V t
Art
NP \)
Art
man
NP\
V ) ~*
NP V t
Art\
N J
NP
Art
NP
Art
V
NP
man
V t
( Art\
\ N
NP
NP V t
V
the
Art\
cat )
At this stage we are tempted to make move (4'), but we shall make move (5') instead (we
abbreviate Conj by C):
(NP V t
NP
NP V NP
and
NP V t NP \
C
NP V t NP
and
NP V t NP 0'
C.
4.7. DERIVATIONS AND PARSING TREES
89
(NP V t NP C \ (NP V t NP C \ / NP V t NP C
V the/ ^ V the/ ^ V Art
NP V t NP C \ (NP V t NP C Art
Art J V ball J
NP V NP C Art \ _ f NP V t NP C Art
ball J ' I N
NP V NP C \ f NP V NP\ f NP V \ ( NP \ f
NP ) ~* \ NP 1 ) ^ \ NP ) ~^ \ VP ) ~^ \S
S
What would have happened if we have followed our first inclination and made move
(4') instead of (5')?
NP\ _ f0\ _ ( S \ fS \ _ fS
VP J ~" [Sj ~" land J ~" { and J [c
S 0\ (S C \ (S C \ _ (S C\ /SCO
C) ~" \ thej ^ I thej ~" \ Art) ^ { Art
S C Art\ fS C Art \ fS C Art\ fSC
ball 7 V ball 7 V N J \ NP
( S \ fS 0\ (S NP'
\NP'J V NP' J \
and there's nowhere to go from here.
Exercise
1. Show how a machine would (a) produce, (b) recognize the sentence <t 2 t 2 in the
language L\ of Section 4.2, Example 4.2.2. (Hint: First replace the original context-free
grammar of C\ by a normal one.
4.7 Derivations and parsing trees
For many purposes, proofs in tree form are better replaced by "derivations" . A derivation
of C > D is a finite sequence of strings
C — Co, Ci , . . . , C n — D
where
Ci = EiAiFi, Ci + \ = E{BiFi
90
CHAPTER 4. CONTEXT-FREE GRAMMARS
and Ai is a production for each i — 0, 1, . . ., n — 1. We say that this derivation
has n steps.
By a derivation of steps we mean an application of the reflexive law: C ^C = D.
A derivation of 1 step has the form
C = EAF, EBF = D
where A > B is a production. Note that C > D can then be proved by the substitu-
tion rule (S'). We shall now give an example of a 2-step derivation. Suppose A > B
and Ai s- B\ are productions and C = EqAqG AiFi, D = EqBqGBiFi. Then we have
the derivation
C = Co = E A (GA 1 F 1 ) - d = E B (GA 1 F 1 )
F F
= (E B G) A 1 F 1 > C 2 = (E BqG) BiFi
^ ' „ '
In this example it is clear that we can prove C > D as follows:
A ^ B A 1 ^ Bi
~ (S') (S')
EqAqGA\Fi >■ EqBqG A\Fi EqBqGAiFi > EqBqGB\Fi
(T)
EqAqG A\Fi > EqBqGB\F\
Quite generally, if C > D has a derivation, then it also has a proof in tree form. Indeed,
in view of reflexivity and transitivity, we need only recall that this so for 1 step derivations.
We have thus established one half of the following:
4.7.1 Theorem A formula C > D is provable in the deductive system T>(Q) if and
only if it has a derivation.
It remains to prove the implication the other way. Suppose C' >■ D' is provable,
that is, possesses a proof in tree form. We shall show by induction on the length of this
proof that there is a derivation of C' > D' . If C' = D', we have an instance of the
reflexive law, hence a step derivation.
If C' s- D' is a production, there is a one step derivation.
Suppose C' > D' is obtained by substitution thus:
C *-D
(S')
C' = ECF ^ EDF = D'
By inductional assumption, C >■ D has a derivation, say:
C — Cq, Ci, . . . , C n — D
4.7. DERIVATIONS AND PARSING TREES
91
Then
C = EC F, EC\F, ... , EC n F = D'
is also a derivation, as is easily checked.
Finally, suppose C > D' is obtained by the transitive law thus:
C > E E > D'
— ; ~ (T)
Juxtaposing derivations of C >■ E and E >■ D', which may be assumed to exist by
inductional assumption, we obtain a derivation of C > D' . m
As an immediate consequence of the theorem we have the following:
4.7.2 Corollary IfQ = (V,^,!!' ,V) is a grammar, then the language C(Q) consists
of all strings E G S* for which there exists a derivation S >E for some S G £'.
Note that in generative grammars, there is a unique S G £'.
We may now return to Example 4.2.2 of Section 4.2. We wish to determine the
language £(Gi), where Q\ is the grammar with productions
S ><rr S >aSr
Thus we seek all possible derivations S E with E G {a, t}*. There cannot be a
step derivation of S >E, since S is not an element of S*. If there is a 1 step derivation,
it must have the form
S = Co = EqAqFo, C\ = EqBqFq = E
where A ^B is a production. Clearly, A = S, B = ar and E — F — 1. If there is
a 2 step derivation S = C , C 1: C 2 = E, we require that
Co = E A F , C\ = EqBqFo = EiAiFi, Ci = E\B\F\
where A ^B and A\ *-B\ are productions. Clearly, then A = A 1 = S, hence E Q =
F = 1, and B , B\ must be err or aSr. Now B = C\ = EiAiFi = EiSFi, hence B =
aSr and E x — a, F ± — r. Thus C 2 = crBiT = E G {er, r}*, and so Bi = or. Therefore,
E = C 2 = oott = a 2 T 2 .
Continuing in the same way, we see that, if S > E has an n step derivation, then
E = a n r n . If properly done, this proof should be by induction on n.
We shall now return to the notion of "parsing tree" , which was informally introduced
in Section 4.1. While proofs in tree form and derivations are valid techniques for all
grammars, parsing trees are essentially confined to context-free grammars. (Their use has
been extended to context sensitive grammars, but these will not be discussed here.)
92
CHAPTER 4. CONTEXT-FREE GRAMMARS
In a context-free grammar, each production has the form
A >A x A 2 ...A n
where A, A\, A2, ■ ■ ., A n are words and n > 1. Take for instance the case n — 3, then we
associate with this production the following picture:
A 1
A
Now suppose we are interested in a derivation A > E. Then either the string E starts
with A\ or else, at some stage of the derivation, Ai is replaced with the help of a production
M > A U A 12 . . . A lni
Take, for instance, the case n± = 2, then we enlarge the picture to
A
Continuing in this fashion, we ultimately obtain a tree where all the words occurring in
the string E appear at the end of the branches. Usually we are interested in the situation
A = S, E G S*. The resulting tree is called a parsing tree of E.
For example, let us look at the grammar Q = (V, {a, r}, {S}, V) with vocabulary V =
{a, r, S,U,V} and productions V : U a, U aU, V r , V rV, S UV.
We shall show that a 2 r is a sentence generated by this grammar
(a) by giving a proof in tree form of the formula S -
(b) by presenting a derivation,
(c) by exhibiting a parsing tree.
4.8. A NECESSARY CONDITION FOR A LANGUAGE TO BE CONTEXT-FREE^
(a)
TT (P1)
U ^cr
U^UaU
all ^aa
U ^aa
(S')
(T)
T/ (P3)
V
uv
■ (JUT
■ GOT
(S)
(T)
(b)
S, UV, aUV, aUr, got
(c)
Exercise
1. Show that a 3 r 2 e C{Q )
a. by a proof in tree form,
b. by a derivation,
c. by a parsing tree.
4.8 A necessary condition for a language to be context-free
We asserted in Section 4.4 that the intersection of two context-free languages need not be
context-free. Indeed, let us consider the terminal vocabulary E = {a, t} and the languages
£ 3 = { a m T m a p | m > 1, p > 1}
94
CHAPTER 4. CONTEXT-FREE GRAMMARS
£ 4 = {aW | q > 1, n > 1}
We shall see that £ 3 and £ 4 have context-free grammars but that £ 3 fl £ 4 does not.
First, observe that £ 3 = £i£s, where
£i = | (7 m r m
has already been met in Section 4.2, and
£ 5 = K I p > i}
has the obvious grammar: S5 ><7, S5 ^crS^. Therefore, £3 is the product of two
context-free languages and thus also context-free, by Theorem 4.4.1. Now £ 4 is clearly
the mirror-image of £ 3 , hence also context-free, by the same theorem.
Now it is clear that
£ 3 n£ 4 = W k r k a k \k>l}
We shall see that this language is not context-free. To show this, we require a result by
Bar-Hillel, Perles and Shamir.
4.8.1 Proposition // £ C S* is an infinite language with a context-free grammar,
then there exist strings A, B, C, D, E E S* such that AB n CD n E E £ for all n > 0,
where B and D are not both empty.
For, example, in the fragment of English discussed in Section 4.1, we might take
A = the man hit the cat
B = and the man hit the cat
C = and
D = the man hit the cat and
E = the man hit the cat
Then ABODE, ABBCDDE, ABBBCDDDE, etc. are all grammatical sentences.
We shall not prove this result now, but apply it to the discussion of C = £3 fl £4.
Let us suppose that £ = £3 fl £ 4 is a context-free language. Then, by the proposition,
there exist strings A, B, C, D, E such that F n = AB n CD n E e £ for all n > 0, and B
and D are not both empty. In particular, F 1 = ABCDE = a k T k cr k , F 2 = ABBCDDE =
o 1 t 1 o 1 , for certain k, I > 1. Moreover, since B and D are not both empty, / is strictly
larger than k.
Now since A is an initial segment of F, we must have either that A = o~ p for some p < k
or that A begins with the string a k r. But in the latter case, it follows that F 2 = AB ■ ■ ■
also begins with the string a k r which is impossible since F 2 = o l ■ ■ ■ with / > k. Therefore
A = a p ,p < k, and, by the mirror image argument, E = a q , with q < I. We then get
BCD = a k - p r k a k - q
4.8. A NECESSARY CONDITION FOR A LANGUAGE TO BE CONTEXT-FREE^
from Fi, and
BBCDD = a l - p r l a l - q
from F 2 . If B — a ■ • • r ■ • •, we would obtain
BBCDD =a---T---a---T---
contradicting the above. Therefore, B is a power of a or a power of r. Suppose B = t 1 ,
% > 1, then the formulas for BCD and BBCDD tell us that — p = and Z — p — 0,
again leading to the contradiction that k = I. Therefore, B = a 1 , and, similarly, D = aK
Thus, from the formula for BCD, we obtain
C = a k ^p~ i T k a - k -i^j
and, from the formula for BBCDD, we obtain
C = a l -P- 2i T l a l -i- 2 j
Thus, again we are led to the contradiction that k — I.
We have now exhausted all the possible cases, so that £ 3 n £4 cannot be a context-free
language.
We shall prove Proposition 4.8.1 after some discussion.
Let us recall Example 4.1.1 of Section 4.1. The 72 sentences of that language may be
summarized by the formula:
S >(the V a)(man V ball V cat) (hit V took) (the V a)(man V ball V cat)
Indeed, if we regard multiplication as being distributive over V, we obtain:
S >■ (the man hit the cat) V (the man hit the ball) V • • •
We have seen how adding the single word and will make the language infinite, provided we
add only the following production: S S and S. Or we could add the single word poor
and the production iV >■ poor N, and again we would get infinitely many sentences of
the form
S > the (poor)™ man hit the cat, n = 0,1,2,. . .
In both cases we have a production of the form T > UTV , where T belongs to the
auxiliary vocabulary and U and V are not both empty. In the first case: T = S, U — 1,
V = and S; in the second case: T = N, U = poor, V — 1. (Recall that 1 = the empty
string.) This idea will be used in the proof of the proposition.
First, we shall establish a lemma.
96
CHAPTER 4. CONTEXT-FREE GRAMMARS
4.8.2 Lemma Let C be a language with a context-free grammar Q . Then, without loss
in generality, we may assume that, in any production A > B, either B is a terminal
word or it is a string containing at least two words.
Proof For the purpose of this proof, let us call the nonterminal word B "bad" if it
occurs on the right hand side of a production of Q. Let
A 1 > B, A 2 ^B, A m
be a complete list of these productions. We also make a list of all productions which
contain the bad word on the left hand side:
B >d, B >C 2 , B >C„
where the Cj are strings. We may also assume that all Cj ^ B, as B ^B is an instance
of the reflexive law and need not be given as a production.
We now form a new grammar Q' . This differs from Q in only one respect: the m pro-
ductions Ai > B and the n productions B > Cj are replaced by the mn productions
A ■([,.
We note first of all that Q' has one less bad word than Q. For none of the Cj is B,
and if Cj is bad in Q', that is, a non terminal word, then it was already bad in Q.
We note secondly that Q' generates the same language C as does Q. For suppose
the production Ai ^Cj is used in showing that S is a theorem in V{Q'), for
some terminal string D, then Ai > B and B > Cj may be used to establish the same
theorem in V{Q). Conversely, suppose the production Ai ^B is used in showing that
S is a theorem in T>{Q), D being a terminal string, then the parsing tree of D
must contain as a part:
A
B
Ei Ei • • • Ek
where E 1 E 2 ■ ■ ■ E k = Cj for some j. Replacing this part by
Ai
Ei E 2 ■ ■ ■ E k
4.8. A NECESSARY CONDITION FOR A LANGUAGE TO BE CONTEXT-FREE 9 7
we obtain a parsing tree for D in Q', which utilizes the production Ai > Cj in place of
Ai > B.
Thus we have replaced the grammar Q by another grammar Q' which generates the
same language as Q, but which contains one bad word less. Repeating this process, we
ultimately get rid of all bad words without changing the language generated. ■
We are finally able to give a proof of Proposition 4.8.1. By the depth of a parsing
tree, we mean the length of the longest downward path you can follow along the tree. It
corresponds to the number of steps in a production.
Proof Let C be an infinite language with a context-free grammar that satisfies the
conclusions of Lemma 4.8.2. In what follows, let
m be the maximal length of all strings B which may appear on the right hand side
of a production A > B and
p be the total number of words in the vocabulary V.
We will now bound the number of parsing trees of depth at most I. The number
of trees of depth is 2 (including the empty tree) and the number of trees of depth at
most 1 is m + 2, since the only such trees are the 2 of depth and the ones of depth
1 with the top node and a A;-fold branch with 1 < k < m. Let f(l) denote the number
of trees of depth at most I. Then from the fact that a tree of depth at most / + 1 can
be gotten (usually in more than one way) by replacing each of the bottom nodes of one
of the m + 2 trees of depth at most one by a tree of depth at most / and there are at
most m such bottom nodes, the number of possibilities is at most f{l) m . In particular,
we have that f{2) < (m + 2) m , which gives /(3) < f{2) m < ((m + 2) m ) m = (m + 2) 2m ,
f(l)<(m + 2) l -\
A parsing tree is gotten by replacing each node in a tree by a word in the language.
Since each node of a parsing tree may be occupied by at most p words, the total number
of parsing trees of depth / cannot exceed p/(0<( m + 2 ) i 1 - hence there cannot be more than
that number of sentences in the language C.
Given that the number of sentences in C is infinite, it follows that there must be
parsing trees of arbitrary depth. Therefore, there exist parsing trees with branches of
arbitrary length. In particular, there will exist a parsing tree with a path of length q > p.
For convenience, we shall picture this path horizontally:
S = Cq — C\ — C*2 — • • • — Cq-i G S
a path in the tree with each Ci G E. Since the total number of words in the vocabulary
is p < q, the words appearing in this branch cannot all be different; say Ci = Cj, where
i< j <q-l.
Now for each k such that % < k < j there is a production
Ck *• B k Ck+iD k
98
CHAPTER 4. CONTEXT-FREE GRAMMARS
with Bk, Dk G E*. Since we have supposed that Q satisfies the conclusions of Lemma 4.8.2
it follows that at least one of Bk or Dk is nonempty. In particular, either Bi or Di is non-
empty. We can therefore deduce a (j — i)-step production
d > B'CjD'
where B' = BiB i+1 . . . Bj_i and D' = -Dj-i . . . A, and at least one of B' and D' is non-
empty. Putting C = Ci = Cj, we obtain a theorem
C ^B'C'D' (1)
Similarly we may obtain a theorem
S >A'C'E' (2)
where A' and E' are strings that might be empty. From (1) and (2) we deduce
S - A'B'C'D'E' > A'B'B'C'D'D'E'
and so on, in fact,
S *A'(B') n C'(D') n E' (3)
for all n > 0.
We have almost proved the proposition, except that there is no reason for A', B', C,
D' and E' to be terminal strings. It remains to be shown that they can be replaced by
terminal strings in (3).
Take for instance B' = BiBi + \ . . . Bj-i. Now Bi occurs in a parsing tree, hence there
is a theorem
Bi > @i@2 ■ ■ ■ Put
in our deductive system, where the (3 r are the terminal words appearing at endpoints of
branches emanating from Bi. Applying the same reasoning to B i+i , . . ., Bj, we obtain a
theorem
B' >PiP 2 '~Pn = B
Thus there are theorems
A' >A, B' *B, C >C, D' >D, E' *E (4)
where A, B, C, D and E are terminal strings. Since our grammar is context-free, B
contains at least as many words as B' and D at least as many as D'. Since B' or D' is
known to be nonempty, it follows that B or D is nonempty. Finally, from (3) and (4), we
obtain the theorem
S >AB n CD n E
for all n > 0, and our proof is complete. ■
Chapter 5
Generative and transformational
grammars
5.1 Inflection rules conjugation
Traditional grammar distinguishes between syntax and morphology, the former dealing
with the formation of sentences from words, while the latter deals with the formation of
words themselves. By now the reader should be convinced that context-free grammars
constitute a useful tool for dealing with syntax. Let us now look at such rules as:
Past work -> worked
Plur man > men
Part be being
If we count Past, Plur and Part among the words of the auxiliary vocabulary, these
rules clearly do not fit into a context-free grammar, or even a context-sensitive gram-
mar, as the left hand side of these productions contains more words than the right hand
side. Unfortunately, English is still an inflected language, and any grammar of English
must incorporate such inflection rules, or "morphographemic" rules, as it may be more
fashionable to call them.
Let us begin by looking at the inflected forms of the English verb. We note that for
each verb V there are 6 forms Cki{V) where k — 1, 2 refers to the tenses present and past,
and % — 1, 2, 3 refers to the three persons usually associated with the pronouns "I", "you"
and "he" or "she" . These 6 forms are conveniently arranged in the form of a matrix:
C(V) = (
C U {V) C 12 (V) C 13 (V)
C 21 (V) C 22 (V) C 23 (V)
)
Here are some examples
99
100
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
{ work work works \
V worked worked worked /
/ go go goes\
\went went gone/
/ can can can \
V could could could /
/ must must must\
\must must must/
/ am are is \
\was were was/
A few comments are in order.
1. English verbs are usually presented in the infinitive, except modal auxiliaries such as
can and must, which do not have an infinitive.
2. We have mentioned two simple tenses: present and past. Compound tenses will be
dealt with in the next section. There are two more simple tenses in English: the present
and past subjunctive. However, these occur so rarely that we have chosen to ignore
them here.
3. We have mentioned three persons of the singular. What about the three persons of
the plural? We may safely identify these with the second person of the singular at this
stage of our investigation, as the verb forms of the plural always coincide with those of
the second person singular.
4. The conjugation matrix for work is regular and serves as a model for most English
verbs. Any departure from this model ought to be listed in the dictionary.
5. Actually, at most five of the six forms mentioned are distinct, as always C 2 i(V) =
C23(V). The maximum, diversity is achieved for V = be, and for V = must all six
forms coincide.
After this preliminary discussion we are able to state productions which will generate
some elementary English sentences, due attention being paid to the morphology.
(1) S ^Subj Pred
(2) Subj ^NPiP,, (2 = 1,2,3)
(3) NP 1 > I
C(work) =
C(go) =
C(can) =
C(must) =
C(be) =
5.1. INFLECTION RULES CONJUGATION
101
NP 2 ^you, we, they
NP 3 > he, she, it, one, Name
Name *-John, Jane,...
(4) Pred > Tense VP
(5) Tense T 2
(6) vp — >y intr , v tr obj,...
(7) V intr > work, rest, go, . . .
Vt r > like, call, hit, take, . . .
(8) C% > Acc iVP^ = 1,2,3)
(9) PiT fe y >C w (y), (i = 1,2,3,*; = 1,2), for any verb V
(10) Acc I >me
Acc he > him
Acc she > her
Acc we > us
Acc they *-them
Acc X >X otherwise
The auxiliary vocabulary consists, aside from Name and Tense, of the following ab-
breviations:
s
= statement
Subj
= subject
Pred
= predicate
NPi
= ith person noun phrase
Pi
= ith person
7i
= present
T 2
= past
VP
= verb phrase
^intr
= intransitive verb
= transitive verb
Obj
= object
Acc
= accusative case
For the moment, V is called a "verb" if there is a production Vintr *- V or V tr > V.
Other verbs will be discussed later. The distinction between transitive and intransitive
verbs is complicated by the fact that some intransitive verbs can be used transitively, as in
I work him hard, and some transitive verbs permit object deletion, as in she called today,
but we shall ignore these complications here.
102
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
We note that productions (1) to (8) satisfy the definition of a context-free grammar,
only productions (9) and (10) do not. For example, let us see how we can generate
the sentence she works. Staying within the context-free part of the grammar, we may
construct the following parsing tree:
Subj
Pred
NP*
she
Tense VP
Ti V ±1
work
This parsing tree yields only the pseudo-sentence:
she P 3 Ti work
To convert this into a terminal string, we apply rule (9):
P 3 Ti work 5-Ci 3 (work) = works
Here is another example:
S
Subj
Pred
NP 1 Pi
Tense
liked
V
tr
like
VP
Obj
/ \
Acc NP 3
she
The inflection rules used here are:
Pi T 2 like >C 21 (like) = liked, Acc she
her
her
5.2. MORE ABOUT TENSE AND VERB PHRASE
103
Exercises
1. Consider the following sentences:
a. I like her
b. she likes me
c. we worked
d. John took it
Construct the underlying parsing trees and show which productions yield the terminal
strings.
2. Suppose we add to the above productions the following:
NP 2 ^NPi and NPj
Show that the following four strings then become sentences.:
a. John and I work
b. I and John work
c. she likes me and John
d. she likes John and I
3. Suppose, in place of the production in Exercise 2, we postulate the following two
productions:
Subj ^NPi and NPj P 2
Obj ^Acc NPi and Acc NPj
Show that (a), (b) and (c) in Exercise 2 remain sentences, but that (d) will be replaced
by:
(d') she likes John and me
5.2 More about tense and verb phrase
While Section 5.1 has served to illustrate some general principles, only the simplest sen-
tences can be generated by the productions considered so far. In particular, we have only
looked at two simple tenses: present and past. However, even aside from the subjunctive,
which we have chosen to ignore, there are lots of compound tenses. We shall now amplify
production (5) of Section 5.1:
(5.1) Tense *-Tj Mod As Pl Asp2, (i = 1, 2)
(5.2) Mod >1, Kod
(5.3) Asp 1 >1, have Perf
(5.4) Asp 2 >1, be Part
104
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
(5.1) to (5.4) are usually summarized as follows:
Tense >Tj (Kod)(have Perf)(be Part)
We require the following further productions:
(5.5) Kod *-must, may, can, shall, will
(5.6) Part work > working
Part take > taking
Perf work worked
Perf take >■ taken
Perf be > been
We have used a number of new auxiliary words:
Mod = modality
Kod = modal auxiliary verb
Asp i = aspect of type i, (i = 1, 2)
Perf = past participle (= perfect)
Part = present participle
(The word "perfect" is borrowed from Latin grammar, where it stands for completed
action. Curiously, in English it is the simple past that is perfective in that sense, while it
is the compound past, whose meaning is generally imperfective that is called the perfect.
This illustrates the follity of attempting to impose the grammar of one language on
another.)
The class of verbs has now been enlarged by V = have and V = be, and by such V for
which there are productions Vmod **V\ It should be pointed out that some verbs, such
as like, are not supposed to admit a progressive form, although one hears such sentences
as how are you liking your new dress?
5.2. MORE ABOUT TENSE AND VERB PHRASE
105
it
This parsing tree yields a pseudo-sentence, which is converted into a terminal string
as follows:
she P3 Ti have Perf be Part take Acc it
has been taking it
Now let us turn attention to the "verb phrase" introduced in Section 5.1. (It should
be pointed out that we have used this term more narrowly than other authors, whose VP
comprises tense and person.) The following expansion of rule (6) of Section 5.1 gives only
a very partial account of the possible verb phrase constructions in English:
Obj Obj,
(6)
v P
-»-Kntr, Kr Obj, V cop
Adj, V g
Vg ive Obj to Obj, V make
Obj Adj
(6.1)
Vop
->■ be, stay, seem, look, . . .
(6.2)
Adj
poor, old, good, green, . . .
(6.3)
Vnake
make, render, leave, . . .
(6.4)
^give
-^►give, teach, tell, show, . . .
Here
Vcop = copula
Adj = adjective
Kake = verb like "make"
Vgi V e = verb like "give"
The class of verbs has been enlarged, and we might summarize all the verbs introduced
hitherto by a production:
Verb
V intr , V tr , have, be, V^od, K op , V ma k e , Vgj ve ,
106
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
There is some duplication in this list, as for example: V tr > have, Kake and V^ ive seem
to be causally related to V cop and V tr . For example, make means cause to be, give means
cause to have, teach means cause to know, etc.
As usual, we have neglected to specify what types of objects can be taken by these
verbs. For example, in
X teaches Y to Z
which is equivalent to
X teaches Z Y
Z normally denotes a person or an animal, and Y denotes something but not somebody,
while in
Z knows Y
Y may denote somebody as well as something. Many grammarians try to incorporate
such restrictions into the grammar. However, it is difficult to be consistent. One author
recently asserted that the subject of eat must be animate. Unfortunately, he went on to
argue his case by saying that stones do not eat, thus inadvertently using the construction
which he claimed is ungrammatical.
Here is one illustration of the expanded version of (6):
S
NP 3 P 3 Tense VP
she Ti V cop Adj
look good
• ^ /
looks
Here we did not make use of the expanded version of (5), or we should have replaced
the Tense part of the parsing tree by:
Tense
Ti Mod Asp 1 Asp 2
5.3. MORE ABOUT THE NOUN PHRASE
107
Exercises
1. For the following sentences construct the underlying parsing trees and show how the
terminal strings are derived:
a. he may have hit her
b. she had hit him
c. she would hit him
d. I was resting
e. you should have been working
2. Suppose you amplify (5.4) by introducing the productions:
Asp 2 Vp r ogPart
Vpro S be, start, stop, keep
Show that the following are sentences:
a. he has stopped hitting her
b. she kept calling him
3. Construct the underlying parsing trees of the following:
a. they left her poor
b. I teach it to her
c. she could have shown me it
5.3 More about the noun phrase
The only noun phrases we had admitted in Section 5.1 were names and pronouns. This
is of course inadequate: we must also consider noun phrases built up from nouns. The
problem is that some nouns must and other nouns may be preceded by an article or
another determiner.
Aside from pronouns, and names, there are in fact several different kinds of nouns:
1. Count nouns (= iV count ) such as man, cat, ball, etc.
They require an article or another determiner and admit plural formation.
2. Plurals (= N vlUT ) may be plurals of count nouns, e.g. men, cats, balls, etc., or may not
possess a singular at all, e.g. scissors. They admit the definite article the, but not the
indefinite article a(n).
3. Substance nouns (= iV subst ) such as milk, soap, bread, etc. They admit the definite but
not the indefinite article.
108
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
If a short digression is permitted, we shall make some remarks of a historical and
philosophical nature on the difference between count nouns and names of substances.
One can say one man, two cats, three balls, etc., but one must say one glass of milk,
two cakes of soap, three loaves of bread, etc. Some similar linguistic phenomenon in an-
cient Greek presumably induced Aristotle to develop his theory of substance and form.
The idea is simply this: when talking about a loaf of bread, bread denotes the substance
and loaf the form. Other ancient philosophers were led to ask: how many substances are
there in nature? The traditional answer was: four, namely, earth, water, air and fire; but
each of these had its champions who proclaimed it to be the only substance. Then there
was the theory that substance=mat(t)er is female, while the form is male, but this takes
us too far afield.
Whether something is denoted by a count noun or is regarded as a substance is often
a question of size. The English language decrees that beans can be counted, but that
rice cannot, unless in the form of grains. At one time peas were regarded too small for
counting and there was an old substances noun pease, as in pease porridge. However, later
peas(e) was regarded as the plural of the newly coined word pea. Perhaps some day rice
will be considered as the plural of rouse.
Articles are not the only kind of determiners. In fact we shall distinguish several types
of determiners:
-Dcount = determiners which may be used for count nouns
-Dsubst = determiners which may be used for substance nouns
-Dpiur = plural determiners, which may be used for plurals
-Dsing = singular determiners, which may be used for count and
substance nouns
-Duniv = universal determiners, which may be used for count
nouns, substance nouns and plurals.
Rule (3) of Section 5.1 should now be supplemented by the following new productions:
(3.1) NP 2 >£> plur N plur
(3.2) NP 3 ^£>count Account, ^subst N subst , somebody, something...
(3.3) -Dpiur ^Amiv, these, those, 1
(3.4) -Duniv "the, my, his, her, your, some, no, ...
(3.5) .Dcount "^sing, a(n), one, each, every
(3.6) Using "Amiv, this, that
(3.7) -Dsubst "-Dsing, 1
(3.5) must be supplemented by the following rule:
an X if X starts with a vowel
a X if X starts with a consonant
However, (3.8) refers to the pronunciation of X and not its spelling.
(3.8) a(n)X "j
5.3. MORE ABOUT THE NOUN PHRASE
109
In the following examples, we make use of the grammar of Section 5.1, but not of the
grammar of Section 5.2.
Subj
Pred
Tense
VP
D
count
N,
count
tr
Obj
D
sing
Dr,
cat
likes
like Acc
D
subst
N,
subst
your
milk
milk
While we have distinguished between substance nouns and count nouns, we have not
distinguished between masculine and feminine, animate and inanimate, etc. Thus we
cannot rule out questionable sentences such as milk likes your cat.
There are many other questions that we have left open. Where do the numerals and
the word all belong? What about your two cats, all her cats, etc.?
110
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
Exercises
1. Utilizing the grammar of Sections 5.1 and 5.3, construct the underlying parsing trees
for the following sentences and show how the terminal strings are derived:
a. each man works
b. the man hit my cat
2. Utilizing also the grammar of Section 5.2, do the same for
a. your cat was resting
3. What additional productions are required to generate the following sentences:
a. this is good
b. I like that
5.4 The passive construction and transformational grammar
We shall discuss the English passive construction, first within generative grammar, and
then from the point of view of transformational grammar, it being our intention to intro-
duce the latter concept in this section.
Everyone knows that the sentences
{takes
took the ball
will take
give rise to the passive forms
!is taken
was taken by the man
will be taken
the propositional phrase by the man being optional. It is clear from these examples that
the passive of a sentence depends on its structure, that is, on its underlying parsing tree.
(1) S >NPi Pi Tense V tI Acc NPj
where i, j — 1, 2, 3, we may form quite generally
(2) S >NPj Pj Tense be Perf V tI (by Acc NPi)
Terminal strings derived from (2) may be regarded as passives of terminal strings
derived from (1). From the viewpoint of generative grammar, we are not interested in the
relationship between these two terminal strings, but only in the grammaticalness of the
former.
5.4. PASSIVE CONSTRUCTION, TRANSFORMATIONAL GRAMMAR
111
The easiest way to assure that (2) and all consequences of (2) are theorems in the
deductive system T>{Q) is to postulate the following new productions, augmenting rule (6)
of Section 5.2:
(6.5) VP -> be Perf V tr Ag
Ag by Obj
where
Ag = agent
Here is an example of a passive construction based on (6.5):
S.
NP
had been being hit
(Stylistically the resulting sentence may be improved if we replace being by getting, as in
Exercise 2 at the end of this section.)
While (6.5) only deals with transitive verbs, verbs like make and give also admit
passives, for example
he was rendered poor by her
she is shown it
it is shown to her
We are therefore led to supplement (6.5) as follows (see Section 5.2 for the verbs
involved) :
VP be Perf V make Adj Ag
(6.6) -^be Perf V s±ve Obj Ag
-> be Perf V give to Obj Ag
One could go on in this way, but perhaps a more radical approach is called for, which
will account for all these rules at once. Here is one suggestion:
VP ^-Pass VP
(6.5*) Pass V^> be Perf V *
Obj 1
112
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
Here Pass = passive, \1/ is an auxiliary symbol, and V is assumed to be any verb which
possesses a passive. This last proviso could be used to prevent us from forming the passive
of the following sentences:
the play lasts three hours
this colour becomes you
he went the whole way
This approach could be extended to account for sentences such as
a new idea is called for
but we shall not pursue it any further.
We have discussed the construction of passive sentences within the framework of gener-
ative grammar. However, this approach does not explain what sentence a passive sentence
is the passive of. It does not explain why it is that in
Y is eaten by X
X normally denotes an animal or a person just as in
X eats Y
It does not explain why in both sentences Y may be soup in Canada and tea in England.
For these and other reasons, Chomsky has attempted to view the passive construction
as a "transformation". According to this view, a transformation transforms one pars-
ing tree into another. Thus, the passive transformation corresponding to (6.5) would
transform the parsing tree
S
NPi NPi Tense VP
X T Kr Obj
V Acc NPj
Y
5.4. PASSIVE CONSTRUCTION, TRANSFORMATIONAL GRAMMAR
113
into the parsing tree
S
NP; Pi Tense VP
X T be Perf V tI Ag
V by Obj
Acc NPi
X
Here X and Y are terminal strings, V is a terminal word and T is a string made up of
terminal words and inflections (Tk, Perf and Part). The branches to those strings are
really abbreviations. Thus
T
each man
stands for
NPj
-'-'count 1 v count
each man
How can we interpret this process in more familiar mathematical terms? One way,
which is close to Chomsky's original view, would be to regard it as a rule of inference as
follows:
S >X PjT V Acc Y, NPj >X, Tense >-T, V tI *V, NPj *-Y
S > Y Pj T be Perf V (by Acc X)
A rule of inference in a deductive system is a rule that says that if you have derived
a certain set of strings, then you may derive another string. The rules (S) and (T) of
Section 4.2 are rules of inference.
114
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
In the present instance the new rule says that if there are productions that derive the
string X Pi T V Acc Y from S, the string X from NPi, the string T from Tense, the
string V tT > V from V (which means that V is a transitive verb) and the string Y from
NPj, then derive the string Y Pj T be Perf V (by Acc X) from S.
Recall that a production A > £> of the generative grammar £ was regarded as an
axiom in the deductive system T>(Q) in Chapter 4. We may now define a transfor-
mational grammar Q' as stipulating not only axioms (productions) but also rules of
inference ( = grammatical transformations) for a deductive system T>{Q'). We replaced
(6.5) by a rule of inference, and we can do the same for each of the productions (6.6).
Once is tempted to combine all the resulting rules of inference into one, as follows:
S > X PjT V Acc Y Z, NPj > X, Tense > T, V pass > V, NPj > y
5 >y Pj T be Perf V Z (by Acc X)
where
Vpass = verb possessing a passive
There are however some difficulties concerning Z, the structure of which should be spec-
ified in some way. We do not want to transform
the cat chased John and me
into
*John was chased and me
or
I saw the father of my friend
into
*the father was seen of my friend
Apparently it is not easy to formulate the correct restrictions on Z, and so perhaps the
pedestrian approach of listing all the rules separately is to be preferred.
Here is another approach to the generation of passives. From such examples as
I found the door closed
there is a well motivated need for a rule of the form
Adj ^Perf VP'
where VP~ stands for a verb phrase with its direct object (or its first object if it takes
more than one) deleted. Assuming such a rule then we have productions such as
S >iVP V cop Adj
^ Art N Ko P Perf VP'
*-the book was given to Mary
5.4. PASSIVE CONSTRUCTION, TRANSFORMATIONAL GRAMMAR
115
where since we have a production
VP *-give book to Mary
then we have a production
VP~ ^book to Mary
If we apply the same production to
VP *-give Mary a book
we derive instead the sentence
Mary was given a book
This proposal is also most simply formulated as a rule of inference. We leave the
details to the reader. The point of this proposal is that it does not add to language any
rule that is not required in any case. It derives the passive as a special case of two rules;
the one that gives predicative adjective sentences and the one that turns a past participle
into an adjective. The agent phrase, if any, is derived by the rule that says
S >S PP
where PP stand for prepositional phrase. This explains the optional nature of the agent
clause.
Commentary by MB
I am not entirely convinced by this analysis of the passive. First, it is well motivated that
English has a rule of the form Adj >■ Part since so many participles function as pure adjectives
(however notice the contrast between, "The door was open" and "The door was opened" ; but
there are not two words in the case of "closed"). At any rate, it is not necessary to posit a
passive transformation since a sentence such as the door was closed by him is generated by the
production
S >S PP z-NP VP PP >Art N V cop Adj PrepPron
s-the door was closed by him
It would be harder, however, to fit the negative and interrogative into the same mold as the
declarative. And many languages would have to be studied to determine if this kind of analysis
of the passive is general. It is interesting to note that this passive "transformation" was one
of the main motivations for transformational grammar, but has now been largely abandoned.
Another point in favor of the transformational approach is that it gives a ready explanation for
the semantic relation of the active and passive modes, which my account does not. The jury is
still out.
116
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
Exercises
1. Using (6.5), exhibit the underlying parsing trees of the following sentences:
the ball may have been taken by her
she is being hit by the ball
2. Discuss a modification of (6.5) which will allow two passive auxiliaries
Upass = be, get
the latter as in she got hit.
3. Show that the proposal of the last paragraphs allows even more passives, such as
the door stayed closed
the book seemed damaged by the rain
The last is perhaps marginal, but not obviously bad.
5.5 Some other transformations
English abounds with constructions that are usually viewed as grammatical transforma-
tions. We shall content ourselves with a few examples. However, we shall not insist
on treating a construction by the methods of transformational grammar if generative
grammar will do.
5.5.1 EXAMPLE The negative transformation assigns to any English sentence its
"grammatical" negation. This should be distinguished from the logical negation, with
which it usually coincides, but not always. For example, the grammatical negation of
John must work
is
John must not work
while its logical negation is
John need not work
We find it most convenient to formulate the negative transformation as a set of produc-
tions. Writing
Neg = negative
we postulate
5.5. SOME OTHER TRANSFORMATIONS
117
(11) Mod^-Neg Mod
(12) Neg have Perf ->■ have not Perf
(13) Neg have Obj ^do not have Obj
(14) Neg V -^V not, if V — be, must, may, can, shall, wil
(15) Neg V -^-do not V, if V ^ have, be, must, . . .
These productions are supplemented by certain optional contraction rules:
is not ->■ isn't
was not ^-wasn't
(16)
can not -^-cannot, can t
which also involve transfer of stress.
Note that Neg have cannot be evaluated until we know whether have is used as an
auxiliary or as a transitive verb. Admittedly, our rules do not explain
I haven't the time
she hasn't any wool
but it would be difficult to admit these sentences without also admitting
*the cat hasn't the ball
For example, in view of (11), we have the parsing tree
S.
Pred
NP
VP
T 2 Mod Asp 1 Asp 2
intr
Neg Mod
be Part work
This yields the pseudo-sentence
he P 3 T 2 Neg be Part work
118
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
In view of (14), this becomes
he P 3 T 2 be not Part work
By the usual inflection rules, this reduces to
he was not working
An optional contraction rule (16) converts this to
he wasn't working
5.5.2 EXAMPLE The emphatic transformation is similar to the negative transforma-
tion. In place of Neg it has Emph and in place of not it has stress, to be placed on the
preceding syllable. The stress is not shown in written English (but see the last problem
of this section).
5.5.3 EXAMPLE The interrogative transformation forms a question from a statement,
asking if the statement is true. For technical reasons, we begin by negating the statement
and express the interrogative transformation as a rule of inference:
Here V x stands for one of the various classes of verbs and Q = question. The latter is
a sentence type that ought to be distinguished from S, the type of statements; as, for
instance, one cannot put and between a question and a statement. The rule above refers
to written English and ignores the rising intonation.
As an illustration, let us look at the pseudo-sentence
S *XC ki (V) notF, NPj *X, V x *V
Q ^C ki (V)XY7
he P 3 T 2 be not Part work
that was discussed in Example 5.5.1. Recalling that
P t T k V >C ki {V)
we obtain
S
he C 2 3(be) not working
-be
Q *-C 2 3(be) he working?
5.5. SOME OTHER TRANSFORMATIONS
119
that is,
Q 5- was he working?
Our rule explains the question
do you have the time?
but not
have you the time?
If it is desirable to include the second question as grammatical, the rule could be modified.
In case one of the rules (16) allows a contraction
C ki {V) not >C w (y)n't
our rule should be supplemented by the following
S ^KQjyytY, NPj — >x, v x —
Q >C ki (V)n'tXYl
This yields, for instance,
Q ^wasn't he working?
5.5.4 Example Wh-question. Given a sentence containing he (or she) one may form a
question by changing the pronoun to who and moving it to the beginning of the sentence,
unless it was there already. However, if who is moved, the interrogative transformation of
Example 5.5.3 must be applied. For example, from (the underlying parsing tree of)
he works
one obtains
who works?
and from
this is he
one obtains
who is this?
and from
you say he drinks
one obtains
who do you say drinks?
120
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
We may try to summarize these facts with the help of two rules of inference:
S > he P 3 X, Pred >X
Q -who P 3 XI
(Wh-1)
(Wh-2)
S -X QJV) not Y he Z, NF\ -X, V x
Q -who C ki {V) XY Z
(As in Example 5.5.3, it is convenient to start with the negative.)
There are some difficulties with (Wh-2), unless the structures of Y and Z are specified.
For example, (Wh-2) would allow us to infer from the negative sentence
I do not know what he will say
X C U (Y) Y Z
the question
*who do I know what will say?
and from the negative sentence
this is not John and I
the question
*who is this John and?
Supposing these difficulties can be overcome (and various suggestions have been made
for doing so), one may treat in a similar way questions starting with whom, what, where,
etc., which are obtained by replacement of him (or her), it, here, etc., respectively.
Exercises
1. Exhibit the underlying parsing trees of the following sentences:
I cannot call you
she does not work
he has not been resting
you were not called by me
she is not clever
5.5. SOME OTHER TRANSFORMATIONS
121
2. Show how the interrogative transformation may be used to obtain the following
questions:
can I call you?
does she work?
has he been resting?
does he have the ball?
were you called by me?
is she clever?
3. Examine the possibility of formulating the rules of emphasis and contraction in such
a way that the contraction rule obligatory — unless its operation is blocked by the not
being emphasized. There is evidence in this suggestion in that whenever contraction is
not applied, the not is emphasized in normal speech.
CHAPTER 5. GENERATIVE AND TRANSFORMATIONAL GRAMMARS
References
Y. Bar-Hillel, M. Perles and E. Shamir, On formal properties of simple phrase structure
grammars. Z. Phonetik Sprachwiss, Kommunikat. 14 (1961), 143-172. Reprinted in
Luce et al, Vol. 3.
L. S. Bobrow and M. A. Arbib, Discrete Mathematics. Saunders, Philadelphia, London,
Toronto, 1974.
G. S. Boolos and R. C. Jeffrey, Computability and Logic. Cambridge University Press,
London, New York, 1974.
S. Brainerd, Introduction to the Mathematics of Language Study. American Elsevier,
New York, 1971.
N. Chomsky, Syntactic Structures. Mouton, 'S-Gravenhage, 1957.
N. Chomsky, On the formal properties of grammars. Information and control, 2 (1959),
137-167.
N. Chomsky, Aspects of the Theory of Syntax. M.I.T. Press, Cambridge, MA, 1964.
N. Chomsky, Language and Mind. Harcourt, Brace, Jovanovich, New York, San Francisco,
Atlanta, 1972.
P. M. Cohn, Algebra and language theory. Bull. London Math. Soc. 7 (1975), 1-29.
M. Davis, Computability and Unsolvability. McGraw-Hill, New York, Toronto, London,
1958.
S. Eilenberg, Automata, Languages and Machines, Vol. A. Academic Press, New York,
1974.
E. Engeler, Introduction to the Theory of Computation. Academic Press, New York,
London, 1973.
R. Fowler, An Introduction to Transformational Syntax. Routledge and Kegan Paul,
London, 1971.
123
124
REFERENCES
S. Ginsburg, The mathematical Theory of Context-Free Languages. McGraw-Hill, New
York, 1966.
A Ginzburg, Algebraic Theory of Automata. Academic Press, New York, London, 1968.
M. Gross, Mathematical Models in Linguistics. Prentice Hall, Englewood Cliffs, NJ, 1972.
M. Gross and A. Lentin, Introduction to Formal Grammars. Springer- Verlag, Berlin,
Heidelberg, New York, 1970.
Z. Harris, Mathematical Structures of Language. Interscience, Wiley, New York, London,
Sydney, Toronto, 1968.
J. E. Hopcroft and J. D. Ullman, Formal Languages and their Relation to Automata.
Addison- Wesley, Reading, MA, 1969.
S. C. Kleene, Introduction to Metamathematics. Van Nostrand, New York, 1952
S. C. Kleene, Representations of events in nerve nets. In Shannon, 3-40.
R. R. Korfhage, Logic and Algorithms. Wiley, New York, London, Sydney, 1966.
J. Lambek, How to program an infinite abacus? Canad. Math. Bull. 4 (1961), 295-302
J. Lambek, The mathematics of Sentence structure. Amer. Math. Monthly, 65 (1958),
154-170.
J. Lambek, Contributions to the mathematical analysis of the English verb phrase. J.
Canad. Linguistic Assoc. 5 (1959), 83-89.
J. Lambek, On the calculus of syntactic types. Proc. Symposium Appl. Math. 12 (1961),
166-178, Amer. Math. Soc, Providence, RI.
J. Lambek, A mathematician looks at French conjugation. Theoretical Linguistics, 2
(1975), 203-214
D. Luce, E. Bush and E. Galanter, eds., Handbook of mathematical psychology, 3 Vols.
Wiley, New York, London, 1963, 1965.
J. Lyons, Introduction to Theoretical Linguistics. Cambridge University Press, London,
New York, 1968.
A. I. Mal'cev, Algorithms and recursive functions. Wolters-Noordhoff, Groningen, 1970.
S. Marcus, Algebraic Linguistics, Analytic Models. Academic Press, New York, London,
1967.
Z. A. Melzak, An informal introduction to computability and computation. Canad. Math.
Bull. 4 (1961), 279-293.
125
F. Palmer, Grammar. Penguin Books, Harmondsworth, Middlesex, 1971.
P. C. Rosenbloom, The Elements of Mathematical Logic. Dover Publications, New York,
1950.
D. Scott, Some definitional suggestions for automata theory. J. Computer Systems Sci-
ences 1 (1967), 187-212.
C. E. Shannon and J. McCarthy, eds., Automata Studies. Princeton University Press,
Princeton NJ, 1956.
A. M. Turing, On computable numbers with an application to the Entscheidungsproblem.
J. London Math. Soc. (Series 2), 42 (1933-37), 230-265, 43 (1937), 544-546.
V. H. Yngve, The depth hypothesis. Amer. Math. Soc, Proc. Symposium in Appl.
Math. 12 (1961), 130-138, Amer. Math. Soc, Providence, RI.
126 REFERENCES
Appendix: A mathematician looks at
the French verb
J. Lambek
6 Introduction
The penalty one has to pay for the enjoyment of French culture is that one must face the
problem of conjugating the French verb. The following considerations are offered to the
mathematically included reader in the hope that they may help him to bypass some of
the struggles the author went through in trying to cope with this problem. However, it
is only fair to point out that the author still does not manage use the "simple" past and
the two subjunctives without the help of his pocket calculator.
The problem is briefly this: even forgetting about imperatives and participles, one
finds associated with French verb V an array of 42 forms C ik (V), where % ranges from 1
to 7 and k from 1 to 6. Here i denotes the "simple tenses": present, imperfect, future,
conditional, present subjunctive, simple past and past subjunctive. No purpose is served
for the moment in subdividing these into two "moods": indicative and subjunctive. The
subscript k represents the six "persons" associated with the pronouns: je, tu, il, (or elle),
nous, vous, ils (or elles). Again, we have chosen not to partition these persons into two
"numbers" : singular and plural.
How is the matrix C(V) constructed? It would be too bad if one had to learn 42 forms
for each verb V. In fact, a superficial investigation shows that the matrix C(V) depends
only on 5 parameters, which may be restricted to 3 on closer examination. Moreover,
for most verbs in the dictionary and for all newly formed verbs, these parameters are
predictable from the ending of the infinite. It turns out that the conjugation matrix
C{V) may be obtained by rewriting rules from a product A(V)B where A(V) is a 7 by
3 matrix depending on the verb V, while B is a 3 by 6 matrix not depending on the
verb. It so happens that B has only 6 nonempty entries, one in each column. Therefore,
multiplication of the matrices A(V) and B is possible, even though French is only a
semigroup and not a ring.
127
128
APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB
To make this article accessible to as many readers as possible, the present investigation
will be concerned with written french. However, one concession has been made to spoken
French: stressed vowels or diphthongs have been underlined, as an understanding of stress
makes it easier to see what is going on. It has been assumed that each verb form has
exactly one stressed vowel or diphthong. Regrettably it has not been found convenient to
distinguish between aj pronounced e and aj pronounced e.
A very much simpler treatment could have been achieved by ignoring 10 troublesome
"exceptional" verbs:
aller, avojr, fallojr, pouvojr, savojr
valoilr, vouloir, etre, fajre, dire
However, these verbs occur so frequently that it was decided to include them in our
considerations.
7 The present tense
Surprisingly, about half the effort in leaning French conjugation is required for mastering
the present tense. We observe that each verb form in the present tense is made up of
two components: stem plus ending. We write x for the stem and provisionally let the
number % — 1, 2, . . . , 6 stand for the ending. The actual nature of the ending depends on
the grapheme preceding it.
How the actual ending is obtained is shown by the following six rewriting rules:
(1)
XI
<
' x\
X
Xx
^Xs
if X ends in a
if X ends in e
if X ends in au or eu
otherwise
Examples: j'aj, j'ajme, je peux, je finis.
Examples: tu es, tu vaux, tu aimes.
(3)
X3 >
X if X ends in a, e, c,d, or t
Xt otherwise
Examples: il a, il ajme, il vajnc, il vend, il bat, il finjs.
(4)
J Xmes if X is stressed
\ Xqns if X is unstressed
7. THE PRESENT TENSE
129
Examples: nous sommes, nous aimons.
(5)
Examples: nous fajtes, nous aimez
(6) XQ *
f Xtes if X is stressed
if X is unstressed
( Xnt if X ends in e or o
I Xent otherwise
Examples: ils ajment, ils ont, ils finjssent.
We hasten to point out that these same rewriting rules will be used in other tenses
later. For instance, while fajtes might be discounted as an exception, this method of
forming the fifth person ending will become the rule in the simple past.
The partition into stem plus ending often allows some leeway and may be based on
an arbitrary decision. For example, we have analyzed ajment as ajme+nt rather than
ajm+ent, but we could equally well have adopted the second alternative. Rule (6) yields
ajment in either case.
In general, there are three present stems. Consider, for example, the present stem of
devojr: je dojs, tu dojs, il dojt, nous devons, vous devez, ils dojvent. To all appearances, the
first three persons are derived from doj, the fourth and fifth persons from the stem dev
and the sixth person from the stem dojv. This is the general situation, although in many
cases two of these stems coincide and some exceptional verbs such as etre appear to have
more than three stems in the present tense.
We shall denote the three present stems of the verb V as follows:
a(V), P(V), j{V)
where the underlining again indicates stress. The full present tense is then normally
obtained by the rewriting rules (1) to (6):
a(V)l, a(V)2, a(V)3, (3{V)A, 0(V)5, X (V)6
For example,
a(devoir) > dgi
and
a(devojr)l ^ dojl > dojs
The present stems of many French verbs may be read off from Tables I and II which
will be found in Sections 10 and 11, respectively. These tables also contain some other
data that will be discussed later.
130
APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB
8 The other tenses
Where the present tense depends on the stems
a(V), P(V), j{V)
the imperfect tense uses instead the stems
/Wai, P(V)\, P(V)a\
In other words, the full imperfect tense is obtained by rules (1) to (6) from
P(V)a\l, P(V)a\2, P(V)a\3, (3(V)\4, (3(V)\5, (3(V)a\6,
Thus the imperfect reads:
(3(V)a[s, (3(V)a\s, (3(V)a[t, [3(V)\ons, (3(V)\ez, [3(V)a\ent
The future and the conditional have the stems
(f>(V)ra, 4>{V)r <t>{V)ro
and
0(y)raj, 4>{V)r\ 4>{V)m
respectively. Here <f>(V) is a new morpheme, which is closely related to the infinitive V.
The reader will observe that rule (1) yields
0(V)ral ^(p(V)ra\
in the future, but
0(F)rail ^0(K)rais
for the conditional.
The present subjunctive makes use of the stems
i{V)e, (3'(V)t, i{V)e
Here 7'(V) = 7(V) and f3'(V) = f3(V) for all but nine of the ten exceptional verbs. More-
over, the Greek letters e and i are subject to the following rewriting rules:
/x v \ Xy if X ends in a or o
U) Xi >-
Iai otherwise
Examples: que nous ayons, que nous soyons, que vous aimiez, que nous voyions
X if X ends in e
(e) Xe > \ X\ if X ends in a or o
Xe otherwise
9. FACTORIZING THE CONJUGATION MATRIX
131
Examples: que tu ajmes, qu'il ajt, qu'il sort, que je meure
The simple past has the stems
±(V), ±(Vy, t(V)r
where ip(V) is another morpheme, one that is rapidly disappearing in spoken French.
Moreover the circumflex is subject to the following rewrite rule
{X if the stressed vowel of X has a circumflex
X with the circumflex placed on the stressed
vowel otherwise
Examples: nous aimames, vous vmtes, nous crumes (from croTtre as well as croTre).
Finally, the past subjunctive has six persons
^(V)sse, ^(V)sses, ip(vyt, ?/>(V)ssions, tl)(V)ss\ez, -^(K)ssent
where ip(V) is the same as ip(V), only without stress. With a little bit of cheating, we
may reduce these six persons to three stems as follows:
±(V)[sse], ^(V)ss\, ±(V)sse
provided [sse] is subject to the following rewriting rules:
[sse]A;
Note that we rewrite
aima[sse]3 *-aima"3 *-aima3 *-aimat
Had there been no circumflex on the a, rule (3) would have eliminated the final t.
"k if k = 3
ssek if k = 1 or 2
9 Factorizing the conjugation matrix
Summarizing our considerations up to this point, we find that the conjugation matrix
C{V) may be obtained by various rewriting rules from the following matrix C'(V):
( a{V)l
a(V)2
a(V)3
(3(V)4
P(V)5
l(V)6 \
/3(F)ajl
/9(V)a[2
(3(V)a\3
0(V)\4
P(V)\5
p(V)a\6
<j){V)ral
4>{V) ra2
0(V)ra3
0(V)r4
<f>(V)r5
0(y)ro6
<^(F)rail
0(V)rai2
4>(V)ra\3
0(y)ri4
4>(V)r\5
0(y)ra]6
i{V)el
i{V)e2
l'(V)e3
(3'(V)iA
(3'(V)l5
Y(y)e6
Wi
^{V)3
il>(V) f4
^(V) 5
*/>(V>6
\V(^)[sse]l
V'(y)[sse]2
^(F)[sse]3
ip(V)ss\5
tp(V)sse6/
132
APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB
Thus we have C' ik (V) >C ik (V)
To a mathematician it is apparent that the matrix C'(V) may be factored as follows:
/ ol(V)
m
l{V) \
P(V)a\
P(V)\
(3{V)*\
(j)(V)ra
<f>(V)r
(p(V)ro
/l 2 3 \
4>(V)ra\
<j>(V)ri
0(V)rai
4 5
i{V)e
(5'{V)t
i{V)e
i>(V)
±{vy
ip{V)r
\4>(V)[sse]
^(V)ssi
vjj(V)sse)
We note that, even though we are in a semigroup and addition has not been defined,
it is possible to multiply two matrices if, as in the present case, each row of the first meets
each column of the second in at most one non-empty place.
Whether this factorization is indicative of the internal logic of the French language,
or whether it is a mere fluke, as linguists would tend to say, there is no harm if the
mathematically inclined reader makes use of it as an aid to memory.
Let us write
P k = kth person
Tj = ith tense
then the importance of the matrix C'(V) derives from the 42 rewriting rules
P kTi V^C' ik (V)
The significance of these rules will only become apparent when one wishes to construct
sentences as in Section 7 below.
A more radical factorization of the matrix C'(V) into a column vector and a row vector
is possible. The column vector is the middle column of the above 7 by 3 matrix, and the
row vector has the form
(SI £2 S3 4 5 E'6)
where S and £' are certain "stress operators" . But this would necessitate introducing a
whole new lot of new rewriting rules, and we shall desist from such a project here.
10 Radicals of ordinary verbs
We observe that the matrix C'(V) discussed in Section 4 depends on 5 parameters called
"radicals" :
a(V), 0(V), 7(V), 0(VO, t(V)
We are not counting ip(V) as different from ifi(V) only by the absence of stress. We
are also not counting (3'(V) and "f'(V), which differ from f3{V) and 7(V) only for nine
exceptional verbs.
10. RADICALS OF ORDINARY VERBS
133
A reduction in the number of parameters to 3 is possible in view of the following
observations:
(i) With very few exceptions, (f>(V) is obtained from the infinitive V by removing the
final r or re.
(ii) With very few exceptions, 7(V) has the same consonants as (3(V) and the same
stress and vowels as a(V).
In this connection, one may also observe that very often a(V) can be obtained from
<j>(V) by adding stress.
We shall refrain from giving a precise formulation of these two observations, so we
are left with 5 radicals. The five radicals of a number of ordinary verbs are listed in
Table I below. It is understood that the verbs listed serve as models for many verbs not
listed. In particular, this is so for aimer and finir, which represent the so called "living
conjugations". Table I does not include the 10 exceptional verbs.
Table I
V
a(V)
POO
= F{v)
= i\v)
<f>(V)
t(V)
aimer
ajme
aim
ajme
aime
aima*
appeler
appelle
appel
appelle
appelle
appela*
manger
mange
mang*
mange
mange
mang*a*
placer
place
plac*
place
place
plac*a*
finir
fini
finiss
finiss
fini
fini
haVr
haj
haiss
haiss
ha'i
ha'i
mourir
meur
mour
meur
mour
mouru
sortjr
sor
sort
sort
sorti
sorti
venjr
vien
ven
vienn
viend
vin
devoir
doj
dev
dojv
dev
du
voir
voj
voy
voj
ver
vi
crojtre
croj
croiss
croiss
croTt
cm
najtre
naj*
naiss
najss
naTt
naqui
prendre
prend
pren
prenn
prend
pri
vend re
vend
vend
vend
vend
vendi
vajn ere
vajnc
vainqu
vajnqu
vainc
vainqui
croire
croj
croy
croj
croi
cru
ecrire
ecn
ecriv
ecriv
ecri
ecrivi
Nre
N
lis
Ns
li
lu
plajre
pla]*
plais
plais
plai
plu
134
APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB
Table I should be thought of as a condensed list of rewriting rules. For example, the
first entry represents the rule
a(aimer) ^aime
The starred letters in Table I signal some minor irregularities, mainly of an ortho-
graphic nature:
a* becomes e before r
g* becomes ge before a or o
c* becomes 9 before a or o
i* becomes T before t
Examples: ils aimerent, nous mangeons, je placais, il plajt.
Of course, these remarks can be phrased as rewriting rules. For example, the first one
becomes
^ y ( hY if Y starts with r
\ aY otherwise
It should be pointed out that, when applying rules (1) to (6), the stem X may contain
a star. Thus we have
,5(manger)4 > mang*4 > mang*ons >■ mangeons
11 Radicals of exceptional verbs
The radicals of the 10 exceptional verbs are listed in Table II, with separate listing for
P'(V) and 7'(V). Note that falloir is incomplete, as it only possesses forms in the third
person.
TABLE II
V
a(V)
P(V)
l(V)
i(Y)
<KV)
±(V)
alley
[va]
all
all
vo
qill
i
alia*
avoir
a
av
a
[ai]
au
eu
falloir
fau
fall
faill
faud
fallu
pouvoir
pen
pouv
puiss
peuv
puiss
pour
pu
savoir
sad
sav
sack
sav
sack
sau
su
valoir
van
vol
val
val
vaill
vaud
valu
vouloir
veu
voul
voul
veul
veuill
voud
voulu
etre
[es]
let]
so
so
so
se
fu
faire
fai
[fais]
fass
So
fass
fe
fi
dire
di
[dis]
dis
dis
dis
di
di
We may regard f3'(V) and 7'(V) as alternative forms (allomorphs) of (3(V) and j(V)
respectively. Other morphemes may possess alternative forms too, and when this is the
12. A PRODUCTION GRAMMAR FOR A FRAGMENT OF FRENCH
135
case we have places them in square brackets. The square brackets may be removed
according to the following instructions, with a change of form in the specified context.
The understanding is that, in contexts not mentioned explicitly, the square brackets may
be removed without change.
[va] becomes va| before 1
[es] becomes su] before 1
[et] becomes som before 4
[et] becomes e before 5
[fa is] becomes fa| before 5
[dis ] becomes dj before 5
[ai] becomes a before e3
All together there are 7 exceptional forms: je vais, je su|s, nous sommes, vous etes, vous
faites, nous dites, qu'il art.
We may of course rephrase the above instructions for removing square brackets as
rewriting rules. For example, the first one becomes:
vajk if k — 1
vak otherwise
[va] k ■
It should be pointed out that we are not allowed to apply rules (1) to (6) to an
expression containing square brackets, and the same goes for rule (e). Thus
a(aller) >[i>a]l / > [va ] 1
>■ vai 1 vais
Similarly
7'(avoir)e3 > [aj]e3 / > [ai]
but *-ae3 *-ai3 *-ait
12 A production grammar for a fragment of French
Our investigation has been conducted in the language of "production grammars". Ac-
tually, we are considering the free semigroups generated by an alphabet made up of two
parts:
(a) the terminal alphabet consisting of French graphemes in sans serif, sometimes with
underlining to indicate stress
(b) an auxiliary alphabet consisting of grammatical terms, in the present case the fol-
lowing symbols:
l,...,6,a,...,^,e, [,],*, P 1 ,...,P 6 ,T 1 ,...,T 7
136
APPENDIX: A MATHEMATICIAN LOOKS AT THE FRENCH VERB
The free semigroup is equipped with a binary relation > which satisfies the usual
reflexive, transitive and substitution laws, as well as all the rewriting rules that have been
introduced in Sections 2 to 6, specifically the following rules:
(i) for 1 to 6 in Section 2,
(ii) for e, i, " and [sse] in Section 3,
(iii) for P k Tj in Section 4,
(iv) for a to ip in Table I and Table II,
(v) for the star in Section 5
(vi) for the square brackets in Section 6.
What one is really interested in is the formation of sentences. For expository purposes,
we shall here confine attention to a small part of the French language which contains a
few declarative sentences made up out of personal pronouns followed by inflected forms
of intransitive verbs and also of the corresponding subjunctive clauses.
We need some additional symbols:
S = sentence
Si = subjunctive clause in ith tense (i = 5, 7)
Subj k = subject in kth person
Predi = predicate in ith tense
Vintr = intransitive verb
We also postulate the following new rewriting rules
S -»- Subj k Predi, (i = 1, 2, 3, 4, 6)
Si ^-que* Subj k Predi (i = 5,7)
Subj 1 — je* P 1
Sub] 2 -s~tU P2
Subj^W P 3 , elle P 3 , on P 3
Subj 4 ->■ nous P 4
Subj 5 ->■ vous P 5
Subj 6 ->■ ils P 6 , elles P 6
Vintr -*-aller, sortjr, venjr, . . .
Predi ^Ti V intT
as well as the following:
eX if X starts with a consonant
' X if X starts with a vowel
12. A PRODUCTION GRAMMAR FOR A FRAGMENT OF FRENCH
137
Unless otherwise specified, i ranges from 1 to 7 and k from 1 to 6.
A few examples will make it clear how declarative sentences and subjunctive clauses
are generated by the above production grammar.
(1) S^Subj 1 Pred x
-je* P x T-y V intI
-^je* Pi Ti aller
-^je* a(aller)l
-*-je* [vajl
-s-je* vail
-^-je* vais
(3) S ->■ Subj 4 Pred 3
->■ nous P 4 T 3 V intI
nous P4 T 3 mour|r
nous 0(mounr)r4
nous mourr4
-> nous mourrons
(5) S ^ Subj 6 Pred 6
— ils P- 6 T 6 \/ intr
i Is P 6 T 6 venir
-> ils ?/>(ven|r)r6
ils vmr6
-> ils vinrent
(2) S
■ Subj 5 Pred$
-vous P 5 T 2 V intT
-vous P 5 T 2 sortjr
-vous /3(sort|r)i5
vous sorti5
■vous sortiez
(4)
S 5 -^ que* Subj 2 Pred 5
^que* tu P 2 T 5 \/ intr
-^-que tu P 2 T 5 aller
-^-que tu 7'(aller)e2
^-que tu ajlle2
-^-que tu a|lle2
-^que tu ajlles
(6) S 7 -> que* Sub j 3 Pred 7
— que* il P 3 T 7 Kntr
^ q u ' i I P 3 T 7 venir
— ^ qu ' il ?/>(venir)"3
— s~ q u ' i I vin"3
^- q u ' i I vm
REFERENCES
1. N. Chomsky, Syntactic structures. Mouton, 'S Gravenhage, 1957.
2. J. Dubois and R. Lagane, La nouvelle grammaire du Frangais. Larousse, Paris, 1973.
3. M. Gross, Mathematical Models in Linguistics. Prentice Hall, Englewood Cliffs, NJ,
1972.
4. A. Simond, Les verbes frangais. Payot, Lausanne, 1973.